Booz Selection Flashcards

1
Q

What is the key question for 1. Describe?

A

How do I develop an understanding of the content of my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the key question for 1. Describe | Processing?

A

How do I clean and separate my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the key question for 1. Describe | Processing | Filtering?

A

How do I identify data based on its absolute or relative values?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the key question for 1. Describe | Processing | Imputation?

A

How do I fill in missing values in my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the key question for 1. Describe | Processing | Dimensionality Reduction?

A

How do I reduce the number of dimensions in my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the key question for 1. Describe | Processing | Normalization & Transformation?

A

How do I reconcile duplication representations in the data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the key question for 1. Describe | Processing | Feature Extraction?

A

Really depends on the domain of the information. Variety of methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For 1. Describe | Processing | Filtering, If you want to add or remove data based on its value, start with:

A

Relational algebra projection and selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For 1. Describe | Processing | Filtering, If early results are uninformative and duplicative, start with:

A

Outlier removal, Exponential smoothing, Gaussian filter, Median filter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For 1. Describe | Processing | Imputation, If you want to generate values from other observations in your dataset, start with:

A

Random sampling, Markov Chain Monte Carlo (MC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For 1. Describe | Processing | Imputation, If you want to generate values without using other observations in your dataset, start with:

A

Mean, Statistical distributions, Regression models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For 1. Describe | Processing | Dimensionality Reduction, If you need to determine whether there is multi-dimensional correlation, start with:

A

PCA and other factor analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For 1. Describe | Processing | Dimensionality Reduction, If you can represent individual observations by membership in a group, start with:

A

K-means clustering, Canopy clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For 1. Describe | Processing | Dimensionality Reduction, If you have unstructured text data, start with:

A

Term Frequency/Inverse Document Frequency (TF IDF)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For 1. Describe | Processing | Dimensionality Reduction, If you have a variable number of features but your algorithm requires a fixed number, start with:

A

Feature hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For 1. Describe | Processing | Dimensionality Reduction, If you are not sure which features are the most important, start with:

A

Wrapper methods, Sensitivity analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

For 1. Describe | Processing | Dimensionality Reduction, If you need to facilitate understanding of the probability distribution of the space, start with:

A

Self organizing maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

For 1. Describe | Processing | Normalization & Transformation, If you suspect duplicate data elements, start with:

A

Deduplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

For 1. Describe | Processing | Normalization & Transformation, If you want your data to fall within a specified range, start with:

A

Normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For 1. Describe | Processing | Normalization & Transformation, If your data is stored in a binary format, start with:

A

Format Conversion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

For 1. Describe | Processing | Normalization & Transformation, If you are operating in frequency space, start with:

A

Fast Fourier Transform (FFT), Discrete wavelet transform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

For 1. Describe | Processing | Normalization & Transformation, If you are operating in Euclidian space, start with:

A

Coordinate transform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the key question for 1. Describe | Aggregation?

A

How do I collect and summarize my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

For 1. Describe | Aggregation, If you are unfamiliar with the dataset, start with:

A

basic statistics: Count, Mean, Standard deviation, Range, Scatter Plots, Box plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

For 1. Describe | Aggregation, If your approach assumes the data follows a distribution, start with:

A

Distribution fitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

For 1. Describe | Aggregation, If you want to understand all the information available on an entity, start with:

A

“Baseball card” aggregation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the key question for 1. Describe | Enrichment?

A

How do I add new information to my data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

For 1. Describe | Enrichment, If you need to keep track of source information or other user-defined parameters, start with:

A

Annotation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

For 1. Describe | Enrichment, If you often process certain data fields together or use one field to compute the value of another, start with:

A

Relational algebra rename, Feature addition (e.g., Geography, Technology, Weather)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the key question for 2. Discover?

A

What are the key relationships in the data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the key question for 2. Discover | Clustering?

A

How do I segment the data to find natural groupings?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

For 2. Discover | Clustering, If you want an ordered set of clusters with variable precision, start with:

A

Hierarchical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

For 2. Discover | Clustering, ? If you have an unknown number of clusters, start with:

A

X-means, Canopy, Apriori

34
Q

For 2. Discover | Clustering, If you have text data, start with:

A

Topic modeling

35
Q

For 2. Discover | Clustering, If you have non-elliptical clusters, start with:

A

Fractal, DB Scan

36
Q

For 2. Discover | Clustering, If you want soft membership in the clusters, start with:

A

Gaussian mixture models

37
Q

For 2. Discover | Clustering, If you have an known number of clusters, start with:

38
Q

What is the key question for 2. Discover | Regression?

A

How do I determine which variables may be important?

39
Q

For 2. Discover | Regression, If your data has unknown structure, start with:

A

Tree-based methods

40
Q

For 2. Discover | Regression, If statistical measures of importance are needed, start with:

A

Generalized linear models

41
Q

For 2. Discover | Regression, If statistical measures of importance are not needed, start with:

A

Regression with shrinkage (e.g., LASSO, Elastic net), Stepwise regression

42
Q

What is the key question for 2. Discover | Hypothesis Testing?

A

How do I test ideas?

43
Q

For 2. Discover | Hypothesis Testing, If you want to compare two groups

44
Q

For 2. Discover | Hypothesis Testing, If you want to compare multiple groups

45
Q

What is the key question for 3. Predict?

A

What are the likely future outcomes?

46
Q

What is the key question for 3. Predict | Classification?

A

How do I predict group membership?

47
Q

For 3. Predict | Classification, If you have known dependent relationships between variables

A

Bayesian network

48
Q

For 3. Predict | Classification, If you are unsure of feature importance, start with:

A

Neural nets, Random forests, Deep learning

49
Q

For 3. Predict | Classification, If you require a highly transparent model, start with:

A

Decision trees

50
Q

For 3. Predict | Classification, If you have less than 20 data dimensions, start with:

A

K-nearest neighbors

51
Q

For 3. Predict | Classification, If you have a large dataset with an unknown classification signal, start with:

A

Naive bayes

52
Q

For 3. Predict | Classification, If you want to estimate an unobservable state based on observable variables, start with:

A

Hidden markov model

53
Q

For 3. Predict | Classification, If you don’t know where else to begin, start with:

A

Support vector machines (SVM), Random forests

54
Q

What is the key question for 3. Predict | Regression?

A

How do I predict a future value?

55
Q

For 3. Predict | Regression, If the data structure is unknown, start with:

A

Tree-based methods

56
Q

For 3. Predict | Regression, If you require a highly transparent model, start with:

A

Generalized linear models

57
Q

For 3. Predict | Regression, If you have less than 20 data dimensions, start with:

A

K-nearest neighbors

58
Q

What is the key question for 3. Predict | Recommendation?

A

How do I predict relevant conditions?

59
Q

For 3. Predict | Recommendation, If you only have knowledge of how people interact with items, start with:

A

Collaborative filtering

60
Q

For 3. Predict | Recommendation, If you have a feature vector of item characteristics, start with:

A

Content-based methods

61
Q

For 3. Predict | Recommendation, If you only have knowledge of how items are connected to one another, start with:

A

Graph-based methods

62
Q

What is the key question for 4. Advise?

A

What course of action should I take?

63
Q

What is the key question for 4. Advise | Logical Reasoning?

A

How do I sort through different evidence?

64
Q

For 4. Advise | Logical Reasoning, If you have expert knowledge to capture

A

Expert systems

65
Q

For 4. Advise | Logical Reasoning, If you’re looking for basic facts

A

Logical reasoning

66
Q

What is the key question for 4. Advise | Optimization?

A

How do I identify the best course of action when my objective can be expressed as a utility function?

67
Q

For 4. Advise | Optimization, If your problem is represented by a non-deterministic utility function, start with:

A

Stochastic search

68
Q

For 4. Advise | Optimization, If approximate solutions are acceptable, start with:

A

Genetic algorithms, Simulated annealing, Gradient search

69
Q

For 4. Advise | Optimization, If your problem is represented by a deterministic utility function, start with:

A

Linear programming, Integer programming, Non-linear programming

70
Q

For 4. Advise | Optimization, If you have limited resources to search with

A

Active learning

71
Q

For 4. Advise | Optimization, If you want to try multiple models

A

Ensemble learning

72
Q

What is the key question for 4. Advise | Simulation?

A

How do I characterize a system that does not have a closed-form representation?

73
Q

For 4. Advise | Simulation, If you must model discrete entities, start with:

A

Discrete event simulation (DES)

74
Q

For 4. Advise | Simulation, If there are a discrete set of possible states, start with:

A

Markov models

75
Q

For 4. Advise | Simulation, If there are actions and interactions among autonomous entities, start with:

A

Agent-based simulation

76
Q

For 4. Advise | Simulation, If you do not need to model discrete entities, start with:

A

Monte Carlo simulation

77
Q

For 4. Advise | Simulation, If you are modeling a complex system with feedback mechanisms between actions, start with:

A

Systems dynamics

78
Q

For 4. Advise | Simulation, If you require continuous tracking of system behavior, start with:

A

Activity-based simulation

79
Q

For 4. Advise | Simulation, If you already have an understanding of what factors govern the system, start with:

A

ODES, PDES

80
Q

For 4. Advise | Simulation, If you have imprecise categories

A

Fuzzy logic