CAP Study Guide Flashcards

Question

In a normal distribution, _____ percent of the data values are within one standard deviation of the mean.

Answer 1

is a statistical technique used in market research to determine how people value different attributes (feature, function, benefits) that make up an individual product or service.

Answer 2

degree of linear correlation of variables, it is computed with the statistical methods such as chi-square test or coefficient of determination

Answer 3

Explained variation / Total variation

Answer 4

0 and 100%

Answer 5

the number of predictors in the model.

Answer 6

increases, even if due to chance alone. It never decreases.

Answer 7

model the random noise in the data. This condition is known as overfitting.

Answer 8

how well a regression model predicts responses for new observations. This statistic helps you determine when the model fits the original data but is less capable of providing valid predictions for new observations.

Answer 9

applies autoregressive moving average ARMA or ARIMA models to find the best fit of a time-series model to past values of a time series.

Answer 10

autoregressive moving average

Answer 11

is a very simple, fast and surprisingly accurate method for grouping objects into clusters. All objects are represented as a point in a multidimensional feature space. The algorithm uses a fast approximate distance metric and two distance thresholds T1 > T2 for processing.

Answer 12

the K-means algorithm or the Hierarchical clustering algorithm.

Answer 13

large data sets

Answer 14

a predetermined ordering from top to bottom.

Answer 15

we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters. Finally, we proceed recursively on each cluster until there is one cluster for each observation.

Answer 16

we assign each observation to its own cluster. Then, compute the similarity (e.g., distance) between each of the clusters and join the two most similar clusters. Finally, repeat steps 2 and 3 until there is only a single cluster left.

Answer 17

summary measure of an individual’s beliefs about whether an event occurs

Answer 18

Intentionally sampling from subpopulations to reduce sampling error for low frequency groups

Answer 19

Box-Cox transformations

Answer 20

Categorical

Answer 21

Likert-type

Answer 22

Semantic diferential

Answer 23

Rank-order

Answer 24

Are all the fields of the data complete?

Answer 25

Is the data accurate?

Answer 26

Is the data provided under a given field and for a given concept consistent with the definition of that field and concept?

Answer 27

Is the data obsolete?

Answer 28

Is the data based on one opinion or on a consensus of experts in the relative area?

Answer 29

Is the data secure from unauthorized use by individuals other than the decision maker?

Answer 30

Is the data legible and comprehensible?

Answer 31

Is the data in a format easily used in the application for which it is intended?

Answer 32

Can the data be conveniently and quickly accessed by the intended user in a time-frame that allows for it to be effectively used?

Answer 33

Is the cost of collecting and using the data commensurate with its value?

Answer 34

Staging area, centralized data, access layers (multiple OLAP data marts)

Answer 35

Organized along a single point of view (e.g. time, product type, geography) for efficient data retrieval

Answer 36

filtering data by picking a specific subset of the data-cube and choosing a single value for one of its dimensions;

Answer 37

grouping data by picking specific values for multiple Dimensions

Answer 38

allow the user to navigate from the most summarized (high-level) to the most detailed (drill-down);

Answer 39

summarize the data along a dimension (e.g., computing totals or using some other formula);

Answer 40

interchange rows and columns (`rotate the cube’).

Answer 41

identify a set of features on a small sample and then testing that set on a holdout sample.

Answer 42

Situations or models containing a random element, hence unpredictable and without a stable pattern or order. All natural events are stochastic phenomenon.

Answer 43

Discrete event simulation

Answer 44

Discrete event simulation

Answer 45

Discrete event simulation

Answer 46

Queuing model

Answer 47

Queuing modeling is not needed

Answer 48

Monte Carlo simulation

Answer 49

System dynamics (SD)

Answer 50

Game theory.

Answer 51

Probability

Answer 52

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution.

Answer 53

distribution that arises naturally in processes for which the waiting times between events are relevant. In particular, the arrival times in the Poisson process have gamma distributions, and the chi-square distribution in statistics is a special case of the gamma distribution. Also, the gamma distribution is widely used to model physical quantities that take positive values.

Answer 54

a discrete frequency distribution that gives the probability of a number of independent events occurring in a fixed time.

Answer 55

receiver operating characteristic

Answer 56

graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied.

Answer 57

plotting the true positive rate against the false positive rate at various threshold settings

Answer 58

sensitivity in signal detection and biomedical informatics, or recall in machine learning

Answer 59

Specificity

Answer 60

False positive

Answer 61

False negative

Answer 62

reduces the uncertainty by half

Answer 63

2000 observations in the smaller of the two target classes.

Answer 64

a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model

Answer 65

The concept arises in decision theory and decision analysis in situations where one gamble (a probability distribution over possible outcomes, also known as prospects) can be ranked as superior to another gamble for a broad class of decision-makers. It is based on shared preferences regarding sets of possible outcomes and their associated probabilities. Only limited knowledge of preferences is required for determining dominance.

Answer 66

same or better level than your data accuracy (e.g. +/- 10% if your data is less than +/- 20%)

Answer 67

Fuzzy logic

Answer 68

Underlying assumptions change

Answer 69

take a potential solution to a problem and check its immediate neighbors (that is, solutions that are similar except for one or two minor details) in the hope of finding an improved solution. Local search methods have a tendency to become stuck in suboptimal regions or on plateaus where many solutions are equally fit.

Answer 70

is a metaheuristic search method employing local search methods used for mathematical optimization. Tabu search enhances the performance of local search by relaxing its basic rule. First, at each step worsening moves can be accepted if no improving move is available (like when the search is stuck at a strict local mimimum). In addition, prohibitions (henceforth the term tabu) are introduced to discourage the search from coming back to previously-visited solutions.

Answer 71

is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs.

CAP Study Guide Flashcards

(101 cards)