Analytic Techniques Flashcards

(110 cards)

1
Q

what technique would you use if you needed to group items or find structure?

a) regression
b) clustering
c) time series

A

b)clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what technique would you use if you needed to discover relationships between actions or items?

a) text analysis
b) regression
c) classification
d) association rules

A

d)association rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what technique would you use if you needed to determine the relationship between the input variables and the outcome?

a) text analysis
b) regression
c) Time series

A

b)regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what technique would you use if you needed to assign labels to objects?

a) classification
b) text analysis
c) regression

A

a)classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what technique would you use if you needed to find structure in temporal data in order to make forecasts?

a) classification
b) text analysis
c) time series

A

c)time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what technique would you use if you needed to analyse free text?

a) time series
b) clustering
c) classification
d) text analysis

A

d)text analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what technique is clustering?

A

k-means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what technique is regression?

A

linear and logistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what technique is classification?

A

naive bayes

decision trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what technique is association rules?

A

apriori

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what technique is time series?

A

ARMA, ARIMA, PACF & ACF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what technique is text analysis?

A

regular expressions
bag of words
TF-IDF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

which methods are the unsupervised learning method?

A

k-means

apriori

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the output of k-means?

A

the cluster centre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the input of k-means?

A

numerical - Euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is euclidian distance?

A

method of calculating distance - most ordinary distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if a domain does not suggest a suitable value for k then what do you do?

A

plot wss and look for elbow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

in k-means what do you do if its missing expected splits?

A

increase k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

in k-means what do you do if its clusters have few data points?

A

decrease k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

in k-means what do you do if the centroids are close together?

A

decrease k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the right description for apriori?

a) if y is observed, then x is also observed
b) if x is observed, then y is also observed

A

b) if x is observed, then y is also observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what’s association rules sometimes referred as?

a) market analysis
b) market basket analysis
c) task basket analysis

A

b) market basket analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is a frequent itemset for apriori?

A

set of items that appear together “often enough”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is normally the support % for apriori? (confidence)

A

50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what is confidence is apriori?
% of transactions that contain x that also contain y
26
in apriori, what does lift mean?
how many times more often x and y occur together than expected
27
in apriori, what does leverage mean?
measures the difference in the probability of x and y appearing together
28
how do you work out confidence with apriori? for example credit good = 700 job skilled = 544 a) 700/544 b) 544/700
b)544/700
29
what is a test set?
hold back some baskets with few random values removed - can the rules fill in the blanks
30
how do you work out lift of if 713 home owners and 527 have good credit. 700 have good credit overall a) 0.527/(0.700*0.713) b) 527/(0.700*713) c) 527/713
a) 0.527/(0.700*0.713)
31
what does regression do? a) looks at a variable between inputs and the outcome b) looks at the relationship between a set of variables and the outcome c) looks at the relationship between a set of outputs
b)looks at the relationship between a set of variables and the outcome
32
what is linear regression
used to estimate a continuous value as linear
33
in regression, what does OLS stand for?
Ordinary least squares
34
in regression, what does OLS do?
finds the best fit line
35
what is the p-value in regression? a) p-value can be used to look for numeric input values b) p-value can be used to determine if the coefficient is significantly not different than zero. c) p-value can be used to determine if the coefficient is significantly different than zero.
c)p-value can be used to determine if the coefficient is significantly different than zero.
36
what does a large p-value mean? a) null hypothesis is rejected b) null hypothesis is not rejected
b) null hypothesis is not rejected
37
what are residuals in regression? a) the similarities between the observed and the estimated outcomes b) the differences between the observed and the estimated outcomes
b)the differences between the observed and the estimated outcomes
38
what is logistic regression?
used to estimate the probability that an event will occur (probability borrower will default)
39
what can logistic regression also be considered as?
classifier
40
what is the standard threshold of logistic regression?
0.5 (50%)
41
What is the preferred method for binary classification problems?
Logistic regression
42
Which isnot binary classification problems? A)true/false B)approve/deny C)respond to medical treatment/not response D)confidence/lift
D) confidence/lift
43
what does pseudo-r2 mean? a) deviance/null deviance b) r squared c) square root
a)deviance/null deviance
44
what is naive Bayes?
determine the most probable class label for each object
45
what is naive Bayes based on?
Bayes law
46
what is naive Bayes used for? a) spam filtering b) scoring c) fraud d) text analysis
spam | fraud
47
what is this? | P(C | A)*P(A) = P(A | C)*P(C) = P(A ^ C).
bayes law
48
to build the naive Bayes classier what do you need?
probability of all class labels
49
in naive Bayes how to classify something?
work out the probability total (good/bad) then multiply all good together and times by total
50
what is a confusion matrix
TPR/FPR
51
where are decision trees found?
data mining applications
52
what are the two types of decision trees?
classification trees | regression trees
53
what is a classification tree?
segment observations into homogeneous groups
54
what is a regression tree?
variations of regression and the average value of each node is returned
55
what is a branch of decision tree?
outcome of decision
56
what is an internal node of decision tree?
test points
57
what is a leaf node of a decision tree?
end of the last branch
58
what should you use a decision tree?
when if-then is preferred to a linear model
59
what is a weak learner (decision trees)
short decision tree
60
in decision trees how do you get the most informative attribute?
entropy based methods
61
what is this for? and what does it mean? | Hcredit = -(0.7 log2(0.7) + 0.3log2(0.3)) = 0.88 ( very close to 1)
``` base entropy (decision tree) high entropy ```
62
what does conditional entropy do in decision trees?
attribute values give more information about the class membership
63
what is information gain?
difference between base and conditional entropy
64
if you have a high information gain what does than mean?
first variable for tree split
65
``` which classifier for these questions: do I want class probabilities or just class labels ```
logistic regression | decision tree
66
which classifier for these questions: | do I want insight into how the variables affect the model?
logistic regression | decision tree
67
which classifier for these questions: | is the problem high dimensional?
naive bayes
68
which classifier for these questions: | do I suspect some of the inputs are correlated?
decision trees | logistic regression
69
which classifier for these questions: | do I suspect sone if the inputs are irrelevant?
decision tree | naive bayes
70
which classifier for these questions: | are there categorical variables with a large number of levels?
naive bayes | decision tree
71
which classifier for these questions: | are there mixed variable types?
decision tree | logistic regression
72
which classifier for these questions: | are there non-linear elements or discontinuities in the data?
decision tree
73
what is time series analysis?
equally spaced out values over time
74
what does time series analysis do?
forecast
75
what is the difference between univariate time series and multivariable time series?
uni is one variable
76
in time series what is the box-jerkins method?
predicts the future
77
what does ARMA stand for?
autoregressive moving averages
78
who invented ARMA model?
box-jenkins
79
what does the box-jenkins method assume the random component is?
stationary sequence
80
what does a stationary sequence mean? a) constant variance b) autocorrelation does not change c) constant deviance d) constant mean
constant variance autocorrelation does not change constant mean
81
to obtain a stationary sequence the data must be?
de-trended | seasonally adjusted
82
what does the ARIMA model do?
uses method differencing to render the data stationary
83
how do you remove a simple linear trend in time series?
subtracting least-squares-fit straight line
84
how do you do a seasonal adjustment for time series?
calculating the average for each month and subtracting them from the actual value
85
what model uses P,Q in time series?
ARMA
86
in AR what is Y? a) Yt is a linear combination of its last p values b) Yt is a linear combination of its last q values
a)Yt is a linear combination of its last p values
87
in MA what is Y? a) Yt is a constant value plus the effects of a dampened white noise process over the last p time values (lags) b) Yt is a constant value plus the effects of a dampened white noise process over the last q time values (lags)
b)Yt is a constant value plus the effects of a dampened white noise process over the last q time values (lags)
88
What is the d in ARIMA (p,d,q)?
differencing term
89
what does ARIMA stand for?
autoregressive integrated moving average
90
what does p mean in time series (ARMA, ARIMA)?
number of autoregressive terms
91
what does d mean in time series (ARMA, ARIMA)?
the number of differences
92
what does q mean in time series (ARMA, ARIMA)?
the number of moving average terms
93
in time series, what does ACF mean?
auto correlation function
94
what is ACF?
provides indication of the stationarity of the data
95
in time series, what does PACF mean?
partial auto correlation function
96
what is PACF?
autocorrelation calculated after removing the linear dependence of the previous terms
97
what is text analysis?
processing of text
98
why is text analysis high-dimensional?
every word is a dimension
99
what are the three problem solving tasks in text analysis?
parsing search/retrieval text-mining
100
what is parsing in text analysis?
imposing structure
101
what is search/retrieval in text analysis?
searching for word or phrase
102
what is a corpus?
body of knowledge
103
what is text-mining in text analysis?
understanding the content
104
what is regex (regular expressions) in text analysis?
used for finding words, strings or patterns in text
105
what is bag of words in text analysis?
term frequency (tf)
106
what is reverse index in text analysis?
a list of all the documents that contain that feature
107
what is IDF in text analysis?
inverse document frequency
108
what are the metrics in text analysis that determine the quality of results? a) recall, relevance, confidence b) relevance, precision, recall c) relevance, lift, recall
b)relevance, precision, recall
109
what does IDF do?
measles the uniqueness of a term in the corpus
110
what does tf-idf mean?
measure of relevance