Analytic Techniques Flashcards
what technique would you use if you needed to group items or find structure?
a) regression
b) clustering
c) time series
b)clustering
what technique would you use if you needed to discover relationships between actions or items?
a) text analysis
b) regression
c) classification
d) association rules
d)association rules
what technique would you use if you needed to determine the relationship between the input variables and the outcome?
a) text analysis
b) regression
c) Time series
b)regression
what technique would you use if you needed to assign labels to objects?
a) classification
b) text analysis
c) regression
a)classification
what technique would you use if you needed to find structure in temporal data in order to make forecasts?
a) classification
b) text analysis
c) time series
c)time series
what technique would you use if you needed to analyse free text?
a) time series
b) clustering
c) classification
d) text analysis
d)text analysis
what technique is clustering?
k-means
what technique is regression?
linear and logistic
what technique is classification?
naive bayes
decision trees
what technique is association rules?
apriori
what technique is time series?
ARMA, ARIMA, PACF & ACF
what technique is text analysis?
regular expressions
bag of words
TF-IDF
which methods are the unsupervised learning method?
k-means
apriori
what is the output of k-means?
the cluster centre
what is the input of k-means?
numerical - Euclidean distance
what is euclidian distance?
method of calculating distance - most ordinary distance
if a domain does not suggest a suitable value for k then what do you do?
plot wss and look for elbow
in k-means what do you do if its missing expected splits?
increase k
in k-means what do you do if its clusters have few data points?
decrease k
in k-means what do you do if the centroids are close together?
decrease k
what is the right description for apriori?
a) if y is observed, then x is also observed
b) if x is observed, then y is also observed
b) if x is observed, then y is also observed
what’s association rules sometimes referred as?
a) market analysis
b) market basket analysis
c) task basket analysis
b) market basket analysis
what is a frequent itemset for apriori?
set of items that appear together “often enough”
what is normally the support % for apriori? (confidence)
50%
what is confidence is apriori?
% of transactions that contain x that also contain y
in apriori, what does lift mean?
how many times more often x and y occur together than expected
in apriori, what does leverage mean?
measures the difference in the probability of x and y appearing together
how do you work out confidence with apriori? for example credit good = 700
job skilled = 544
a) 700/544
b) 544/700
b)544/700
what is a test set?
hold back some baskets with few random values removed - can the rules fill in the blanks
how do you work out lift of if 713 home owners and 527 have good credit. 700 have good credit overall
a) 0.527/(0.7000.713)
b) 527/(0.700713)
c) 527/713
a) 0.527/(0.700*0.713)
what does regression do?
a) looks at a variable between inputs and the outcome
b) looks at the relationship between a set of variables and the outcome
c) looks at the relationship between a set of outputs
b)looks at the relationship between a set of variables and the outcome
what is linear regression
used to estimate a continuous value as linear
in regression, what does OLS stand for?
Ordinary least squares
in regression, what does OLS do?
finds the best fit line
what is the p-value in regression?
a) p-value can be used to look for numeric input values
b) p-value can be used to determine if the coefficient is significantly not different than zero.
c) p-value can be used to determine if the coefficient is significantly different than zero.
c)p-value can be used to determine if the coefficient is significantly different than zero.
what does a large p-value mean?
a) null hypothesis is rejected
b) null hypothesis is not rejected
b) null hypothesis is not rejected
what are residuals in regression?
a) the similarities between the observed and the estimated outcomes
b) the differences between the observed and the estimated outcomes
b)the differences between the observed and the estimated outcomes
what is logistic regression?
used to estimate the probability that an event will occur (probability borrower will default)
what can logistic regression also be considered as?
classifier
what is the standard threshold of logistic regression?
0.5 (50%)
What is the preferred method for binary classification problems?
Logistic regression
Which isnot binary classification problems?
A)true/false
B)approve/deny
C)respond to medical treatment/not response
D)confidence/lift
D) confidence/lift
what does pseudo-r2 mean?
a) deviance/null deviance
b) r squared
c) square root
a)deviance/null deviance
what is naive Bayes?
determine the most probable class label for each object