Analytic Techniques Flashcards

1
Q

what technique would you use if you needed to group items or find structure?

a) regression
b) clustering
c) time series

A

b)clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what technique would you use if you needed to discover relationships between actions or items?

a) text analysis
b) regression
c) classification
d) association rules

A

d)association rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what technique would you use if you needed to determine the relationship between the input variables and the outcome?

a) text analysis
b) regression
c) Time series

A

b)regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what technique would you use if you needed to assign labels to objects?

a) classification
b) text analysis
c) regression

A

a)classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what technique would you use if you needed to find structure in temporal data in order to make forecasts?

a) classification
b) text analysis
c) time series

A

c)time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what technique would you use if you needed to analyse free text?

a) time series
b) clustering
c) classification
d) text analysis

A

d)text analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what technique is clustering?

A

k-means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what technique is regression?

A

linear and logistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what technique is classification?

A

naive bayes

decision trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what technique is association rules?

A

apriori

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what technique is time series?

A

ARMA, ARIMA, PACF & ACF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what technique is text analysis?

A

regular expressions
bag of words
TF-IDF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

which methods are the unsupervised learning method?

A

k-means

apriori

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the output of k-means?

A

the cluster centre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the input of k-means?

A

numerical - Euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is euclidian distance?

A

method of calculating distance - most ordinary distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if a domain does not suggest a suitable value for k then what do you do?

A

plot wss and look for elbow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

in k-means what do you do if its missing expected splits?

A

increase k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

in k-means what do you do if its clusters have few data points?

A

decrease k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

in k-means what do you do if the centroids are close together?

A

decrease k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the right description for apriori?

a) if y is observed, then x is also observed
b) if x is observed, then y is also observed

A

b) if x is observed, then y is also observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what’s association rules sometimes referred as?

a) market analysis
b) market basket analysis
c) task basket analysis

A

b) market basket analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is a frequent itemset for apriori?

A

set of items that appear together “often enough”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is normally the support % for apriori? (confidence)

A

50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is confidence is apriori?

A

% of transactions that contain x that also contain y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

in apriori, what does lift mean?

A

how many times more often x and y occur together than expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

in apriori, what does leverage mean?

A

measures the difference in the probability of x and y appearing together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

how do you work out confidence with apriori? for example credit good = 700
job skilled = 544

a) 700/544
b) 544/700

A

b)544/700

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what is a test set?

A

hold back some baskets with few random values removed - can the rules fill in the blanks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

how do you work out lift of if 713 home owners and 527 have good credit. 700 have good credit overall

a) 0.527/(0.7000.713)
b) 527/(0.700
713)
c) 527/713

A

a) 0.527/(0.700*0.713)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what does regression do?

a) looks at a variable between inputs and the outcome
b) looks at the relationship between a set of variables and the outcome
c) looks at the relationship between a set of outputs

A

b)looks at the relationship between a set of variables and the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what is linear regression

A

used to estimate a continuous value as linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

in regression, what does OLS stand for?

A

Ordinary least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

in regression, what does OLS do?

A

finds the best fit line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what is the p-value in regression?

a) p-value can be used to look for numeric input values
b) p-value can be used to determine if the coefficient is significantly not different than zero.
c) p-value can be used to determine if the coefficient is significantly different than zero.

A

c)p-value can be used to determine if the coefficient is significantly different than zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what does a large p-value mean?

a) null hypothesis is rejected
b) null hypothesis is not rejected

A

b) null hypothesis is not rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

what are residuals in regression?

a) the similarities between the observed and the estimated outcomes
b) the differences between the observed and the estimated outcomes

A

b)the differences between the observed and the estimated outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

what is logistic regression?

A

used to estimate the probability that an event will occur (probability borrower will default)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what can logistic regression also be considered as?

A

classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

what is the standard threshold of logistic regression?

A

0.5 (50%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the preferred method for binary classification problems?

A

Logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Which isnot binary classification problems?
A)true/false
B)approve/deny
C)respond to medical treatment/not response
D)confidence/lift

A

D) confidence/lift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

what does pseudo-r2 mean?

a) deviance/null deviance
b) r squared
c) square root

A

a)deviance/null deviance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

what is naive Bayes?

A

determine the most probable class label for each object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

what is naive Bayes based on?

A

Bayes law

46
Q

what is naive Bayes used for?

a) spam filtering
b) scoring
c) fraud
d) text analysis

A

spam

fraud

47
Q

what is this?

P(C | A)P(A) = P(A | C)P(C) = P(A ^ C).

A

bayes law

48
Q

to build the naive Bayes classier what do you need?

A

probability of all class labels

49
Q

in naive Bayes how to classify something?

A

work out the probability total (good/bad) then multiply all good together and times by total

50
Q

what is a confusion matrix

A

TPR/FPR

51
Q

where are decision trees found?

A

data mining applications

52
Q

what are the two types of decision trees?

A

classification trees

regression trees

53
Q

what is a classification tree?

A

segment observations into homogeneous groups

54
Q

what is a regression tree?

A

variations of regression and the average value of each node is returned

55
Q

what is a branch of decision tree?

A

outcome of decision

56
Q

what is an internal node of decision tree?

A

test points

57
Q

what is a leaf node of a decision tree?

A

end of the last branch

58
Q

what should you use a decision tree?

A

when if-then is preferred to a linear model

59
Q

what is a weak learner (decision trees)

A

short decision tree

60
Q

in decision trees how do you get the most informative attribute?

A

entropy based methods

61
Q

what is this for? and what does it mean?

Hcredit = -(0.7 log2(0.7) + 0.3log2(0.3)) = 0.88 ( very close to 1)

A
base entropy (decision tree)
high entropy
62
Q

what does conditional entropy do in decision trees?

A

attribute values give more information about the class membership

63
Q

what is information gain?

A

difference between base and conditional entropy

64
Q

if you have a high information gain what does than mean?

A

first variable for tree split

65
Q
which classifier for these questions:
do I want class probabilities or just class labels
A

logistic regression

decision tree

66
Q

which classifier for these questions:

do I want insight into how the variables affect the model?

A

logistic regression

decision tree

67
Q

which classifier for these questions:

is the problem high dimensional?

A

naive bayes

68
Q

which classifier for these questions:

do I suspect some of the inputs are correlated?

A

decision trees

logistic regression

69
Q

which classifier for these questions:

do I suspect sone if the inputs are irrelevant?

A

decision tree

naive bayes

70
Q

which classifier for these questions:

are there categorical variables with a large number of levels?

A

naive bayes

decision tree

71
Q

which classifier for these questions:

are there mixed variable types?

A

decision tree

logistic regression

72
Q

which classifier for these questions:

are there non-linear elements or discontinuities in the data?

A

decision tree

73
Q

what is time series analysis?

A

equally spaced out values over time

74
Q

what does time series analysis do?

A

forecast

75
Q

what is the difference between univariate time series and multivariable time series?

A

uni is one variable

76
Q

in time series what is the box-jerkins method?

A

predicts the future

77
Q

what does ARMA stand for?

A

autoregressive moving averages

78
Q

who invented ARMA model?

A

box-jenkins

79
Q

what does the box-jenkins method assume the random component is?

A

stationary sequence

80
Q

what does a stationary sequence mean?

a) constant variance
b) autocorrelation does not change
c) constant deviance
d) constant mean

A

constant variance
autocorrelation does not change
constant mean

81
Q

to obtain a stationary sequence the data must be?

A

de-trended

seasonally adjusted

82
Q

what does the ARIMA model do?

A

uses method differencing to render the data stationary

83
Q

how do you remove a simple linear trend in time series?

A

subtracting least-squares-fit straight line

84
Q

how do you do a seasonal adjustment for time series?

A

calculating the average for each month and subtracting them from the actual value

85
Q

what model uses P,Q in time series?

A

ARMA

86
Q

in AR what is Y?

a) Yt is a linear combination of its last p values
b) Yt is a linear combination of its last q values

A

a)Yt is a linear combination of its last p values

87
Q

in MA what is Y?

a) Yt is a constant value plus the effects of a dampened white noise process over the last p time values (lags)
b) Yt is a constant value plus the effects of a dampened white noise process over the last q time values (lags)

A

b)Yt is a constant value plus the effects of a dampened white noise process over the last q time values (lags)

88
Q

What is the d in ARIMA (p,d,q)?

A

differencing term

89
Q

what does ARIMA stand for?

A

autoregressive integrated moving average

90
Q

what does p mean in time series (ARMA, ARIMA)?

A

number of autoregressive terms

91
Q

what does d mean in time series (ARMA, ARIMA)?

A

the number of differences

92
Q

what does q mean in time series (ARMA, ARIMA)?

A

the number of moving average terms

93
Q

in time series, what does ACF mean?

A

auto correlation function

94
Q

what is ACF?

A

provides indication of the stationarity of the data

95
Q

in time series, what does PACF mean?

A

partial auto correlation function

96
Q

what is PACF?

A

autocorrelation calculated after removing the linear dependence of the previous terms

97
Q

what is text analysis?

A

processing of text

98
Q

why is text analysis high-dimensional?

A

every word is a dimension

99
Q

what are the three problem solving tasks in text analysis?

A

parsing
search/retrieval
text-mining

100
Q

what is parsing in text analysis?

A

imposing structure

101
Q

what is search/retrieval in text analysis?

A

searching for word or phrase

102
Q

what is a corpus?

A

body of knowledge

103
Q

what is text-mining in text analysis?

A

understanding the content

104
Q

what is regex (regular expressions) in text analysis?

A

used for finding words, strings or patterns in text

105
Q

what is bag of words in text analysis?

A

term frequency (tf)

106
Q

what is reverse index in text analysis?

A

a list of all the documents that contain that feature

107
Q

what is IDF in text analysis?

A

inverse document frequency

108
Q

what are the metrics in text analysis that determine the quality of results?

a) recall, relevance, confidence
b) relevance, precision, recall
c) relevance, lift, recall

A

b)relevance, precision, recall

109
Q

what does IDF do?

A

measles the uniqueness of a term in the corpus

110
Q

what does tf-idf mean?

A

measure of relevance