Midterm Qs Flashcards
Logistic Regression assumes that
the log odds of response categories are linear
O
O
Distances between observations measured on mixed (categorical, continuous and binary) variables can be calculated using
Gower’s distance
Asymmetric binary distance ignores
0-0 matches
The bootstrap, and its most common use, can be best summarized as
Repeated sampling from the data with replacement, fitting a model to those samples and investigating the changes in parameter estimates
THe Jackknife method could be summarized as
Application of CV to estimation of standard errors and bias
R function to fit Log Regression
glm()
Which of the following statements generally hold tru about testing and training sets?
The logloss of the test set equals the logos of the training set
The logos of the test set is less than the logos of the training set
the logos of the test set us larger than the logloss of the training set
The logos of the test set does not generally have any relationship with the logloss of the training set
the logos of the test set us larger than the logloss of the training set
A simple model is most at risk at suffering from high…
bias
A flexible model is most at risk at suffering from high…
variance
In multiple Linear Regression the r^2 value provides the…
total amount of variation in the response variable explained by the model
Q. 12, peep the pic
g
d
d
What is a p-value
the probability of observing a test statistic as more extreme than that which we observed, assuming the null hypothesis is true
look at pic
d
Which of the following is True about hierarchical clustering?
Resulting cluster memberships depend on random starting points (non-deterministic)
Resulting cluster memberships do not depend on the scale of the data (scale invariant)
Resulting cluster memberships do not depend on the distance-measure matrix
Resulting cluster memberships do not depend on a chosen linkage method
Resulting cluster memberships do not depend on a chosen linkage method
Which of the following statements is FALSE about k-means clustering
Resulting clusters memberships depend on random starting point (non-deterministic)
Resulting cluster memberships depend on the scale of the data (scale invariant)
Resulting cluster memberships provide the global maxima for the within group sum of squares
Resulting cluster memberships do not depend on a chosen linkage method
Resulting cluster memberships do not depend on a chosen linkage method