Sample Q from SOA Flashcards

1
Q
  1. Determine which of the following statements is/are true.
    I. The number of clusters must be pre-specified for both K-means and
    hierarchical clustering.
    II. The K-means clustering algorithm is less sensitive to the presence of
    outliers than the hierarchical clustering algorithm.
    III. The K-means clustering algorithm requires random assignments while the
    hierarchical clustering algorithm does not.
(A) I only
(B) II only
(C) III only
(D) I, II and II
(E) The correct answer is not given by (A), (B), (C), or (D)
A

I is false because the number of clusters is pre-specified in the K-means algorithm but not for the hierarchical algorithm.
II is also false because both algorithms force each observation to a cluster so that both may be heavily distorted by the presence of outliers.
III is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Consider the following statements:
    I. Principal Component Analysis (PCA) provide low-dimensional linear
    surfaces that are closest to the observations.
    II. The first principal component is the line in p-dimensional space that is
    closest to the observations.
    III. PCA finds a low dimension representation of a dataset that contains as
    much variation as possible.
    IV. PCA serves as a tool for data visualization.

Determine which of the statements are correct.
(A) Statements I, II, and III only
(B) Statements I, II, and IV only
(C) Statements I, III, and IV only
(D) Statements II, III, and IV only
(E) Statements I, II, III, and IV are all correct

A

Statement I is correct – Principal components provide low-dimensional linear surfaces
that are closest to the observations.
Statement II is correct – The first principal component is the line in p-dimensional space
that is closest to the observations.
Statement III is correct – PCA finds a low dimension representation of a dataset that
contains as much variation as possible.
Statement IV is correct – PCA serves as a tool for data visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Consider the following statements:
    I. The proportion of variance explained by an additional principal
    component never decreases as more principal components are added.
    II. The cumulative proportion of variance explained never decreases as more
    principal components are added.
    III. Using all possible principal components provides the best understanding
    of the data.
    IV. A scree plot provides a method for determining the number of principal
    components to use.
Determine which of the statements are correct.
(A) Statements I and II only
(B) Statements I and III only
(C) Statements I and IV only
(D) Statements II and III only
(E) Statements II and IV only
A

Statement I is incorrect – The proportion of variance explained by an additional principal
component decreases are stays the same as more principal components are added.
Statement II is correct – The cumulative proportion of variance explained increases or
stays the same as more principal components are added.
Statement III is incorrect – We want to use the least number of principal components
required to get the best understanding of the data.
Statement IV is correct – Typically, the number of principal components are chosen
based on a scree plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Determine which of the following pairs of distribution and link function is the
    most appropriate to model if a person is hospitalized or not.
    (A) Normal distribution, identity link function
    (B) Normal distribution, logit link function
    (C) Binomial distribution, linear link function
    (D) Binomial distribution, logit link function
    (E) It cannot be determined from the information given.
A
  1. The intent is to model a binary outcome, thus a classification model is desired. In
    GLM, this is equivalent to binomial distribution. The link function should be one
    that restricts values to the range zero to one. Of linear and logit, only logit has this
    property
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Determine which of the following statements describe the advantages of using an alternative fitting procedure, such as subset selection and shrinkage, instead of
    least squares.
    I. Doing so will likely result in a simpler model
    II. Doing so will likely improve prediction accuracy
    III. The results are likely to be easier to interpret
    (A) I only
    (B) II only
    (C) III only
    (D) I, II, and III
    (E) The correct answer is not given by (A), (B), (C), or (D)
A
  1. Key: D
    Alternative fitting procedures will tend to remove the irrelevant variables from the
    predictors, thus resulting in a simpler and easier to interpret model. Accuracy will likely be improved due to reduction in variance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Determine which of the following statements about random forests is/are true?
    I. If the number of predictors used at each split is equal to the total number
    of available predictors, the result is the same as using bagging.
    II. When building a specific tree, the same subset of predictor variables is
    used at each split.
    III. Random forests are an improvement over bagging because the trees are
    decorrelated.
    (A) None
    (B) I and II only
    (C) I and III only
    (D) II and III only
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: C

II is false because with random forest a new subset of predictors is selected for each split.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Determine which of the following statements is true

(A) Linear regression is a flexible approach
(B) Lasso is more flexible than a linear regression approach
(C) Bagging is a low flexibility approach
(D) There are methods that have high flexibility and are also easy to interpret
(E) None of (A), (B), (C), or (D) are true

A
  1. Key: E
    A is false, linear regression is considered inflexible because the number of possible
    models is restricted to a certain form.
    B is false, the lasso determines the subset of variables to use while linear regression
    allows the analyst discretion regarding adding or moving variables.
    C is false, bagging provides additional flexibility.
    D is false, there is a tradeoff between being flexible and easy to interpret.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Determine which of the following statements is/are true for a simple linear
    relationship, 0 1 y x =+ + ββ ε .
    I. If ε = 0 , the 95% confidence interval is equal to the 95% prediction
    interval.
    II. The prediction interval is always at least as wide as the confidence
    interval.
    III. The prediction interval quantifies the possible range for Eyx (|).
    (A) I only
    (B) II only
    (C) III only
    (D) I, II, and III
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key E
    I is true. The prediction interval includes the irreducible error, but in this case it is zero.
    II is true. Because it includes the irreducible error, the prediction interval is at least as
    wide as the confidence interval.
    III. is false. It is the confidence interval that quantifies this range.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. From an investigation of the residuals of fitting a linear regression by ordinary
    least squares it is clear that the spread of the residuals increases as the predicted
    values increase. Observed values of the dependent variable range from 0 to 100.
    Determine which of the following statements is/are true with regard to transforming the
    dependent variable to make the variance of the residuals more constant.
    I. Taking the logarithm of one plus the value of the dependent variable may
    make the variance of the residuals more constant.
    II. A square root transformation may make the variance of the residuals more
    constant.
    III. A logit transformation may make the variance of the residuals more
    constant.
    (A) None
    (B) I and II only
    (C) I and III only
    (D) II and III only
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: B
    Adding a constant to the dependent variable avoids the problem of the logarithm of zero being negative infinity. In general, a log transformation may make the variance constant. Hence I is true. Power transformations with the power less than one, such as the squareroot transformation, may make the variance constant. Hence II is true. A logit transformation requires that the variable take on values between 0 and 1 and hence cannot be used here.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. Determine which of the following statements is applicable to K-means clustering
    and is not applicable to hierarchical clustering.
    (A) If two different people are given the same data and perform one iteration of the
    algorithm, their results at that point will be the same.
    (B) At each iteration of the algorithm, the number of clusters will be greater than the
    number of clusters in the previous iteration of the algorithm.
    (C) The algorithm needs to be run only once, regardless of how many clusters are
    ultimately decided to use.
    (D) The algorithm must be initialized with an assignment of the data points to a
    cluster.
    (E) None of (A), (B), (C), or (D) meet the meet the stated criterion.
A
  1. Key: D
    (A) For K-means the initial cluster assignments are random. Thus different people can
    have different clusters, so the statement is not true for K-means clustering. It is true for
    hierarchical clustering.
    (B) For K-means the number of clusters is set in advance and does not change as the
    algorithm is run. For hierarchical clustering the number of clusters is determined after the
    algorithm is completed.
    (C) For K-means the algorithm needs to be re-run if the number of clusters is changed.
    This is not the case for hierarchical clustering.
    (D) This is true for K-means clustering. Agglomerative hierarchical clustering starts with
    each data point being its own cluster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. An analyst is modeling the probability of a certain phenomenon occurring. The
    analyst has observed that the simple linear model currently in use results in
    predicted values less than zero and greater than one.
    Determine which of the following is the most appropriate way to address this issue.
    (A) Limit the data to observations that are expected to result in predicted values
    between 0 and 1.
    (B) Consider predicted values below 0 as 0 and values above 1 as 1.
    (C) Use a logit function to transform the linear model into only predicting values
    between 0 and 1.
    (D) Use the canonical link function for the Poisson distribution to transform the linear
    model into only predicting values between 0 and 1.
    (E) None of the above
A
  1. Key: C
    (A) is not appropriate because removing data will likely bias the model estimates.
    (B) is not appropriate because altering data will likely bias the model estimates.
    (C) is correct.
    (D) is not appropriate because the canonical link function is the logarithm, which will not
    restrict values to the range zero to one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. A random walk is expressed as
    Y(t) = Y (t-1) + c for t = 1,2, …
    where
    E (c) = µ(c) and Var(c) = σ(t) ^2

Determine which statements is/are true with respect to a random walk model.
I. If µ(c) ≠ 0, then the random walk is nonstationary in the mean.
II. If 2 0 σ (c)^2 = , then the random walk is nonstationary in the variance.
III. If 2 0 σ (c)^2 > , then the random walk is nonstationary in the variance.
(A) None
(B) I and II only
(C) I and III only
(D) II and III only
(E) The correct answer is not given by (A), (B), (C), or (D).

A
  1. Key: C
    I is true because the mean 0 ( ) Ey y t t c = + µ depends on t.
    II is false because the variance 2 () 0 Var y t t c = = σ does not depend in t.
    III is true because the variance depends on t.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. Determine which of the following statements concerning decision tree pruning
    is/are true.
    I. The recursive binary splitting method can lead to overfitting the data.
    II. A tree with more splits tends to have lower variance.
    III. When using the cost complexity pruning method, α = 0 results in a very
    large tree.
    (A) None
    (B) I and II only
    (C) I and III only
    (D) II and III only
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: C
    I is true because the method optimizes with respect to the training set, but may perform
    poorly on the test set.
    II is false because additional splits tends to increase variance due to adding to the
    complexity of the model.
    III is true because in this case only the training error is measured.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. Determine which of the following considerations may make decision trees
    preferable to other statistical learning methods.
    I. Decision trees are easily interpretable.
    II. Decision trees can be displayed graphically.
    III. Decision trees are easier to explain than linear regression methods.
    (A) None
    (B) I and II only
    (C) I and III only
    (D) II and III only
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: E
    All three statements are true. See Section 8.1 of An Introduction to Statistical Learning.
    The statement that trees are easier to explain than linear regression methods may not be
    obvious. For those familiar with regression but just learning about trees, the reverse may
    be the case. However, for those not familiar with regression, relating the dependent
    variable to the independent variables, especially if the dependent variable has been
    transformed, can be difficult to explain.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. Principal component analysis is applied to a large data set with four variables.
    Loadings for the first four principal components are estimated.
    Determine which of the following statements is/are true with respect the loadings.
    I. The loadings are unique.
    II. For a given principal component, the sum of the squares of the loadings
    across the four variables is one.
    III. Together, the four principal components explain 100% of the variance.
    (A) None
    (B) I and II only
    (C) I and III only
    (D) II and III only
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: D
    I is false because the loadings are unique only up to a sign flip.
    II is true. Principal components are designed to maximize variance. If there are no
    constraints on the magnitude of the loadings, the variance can be made arbitrarily large.
    The PCA algorithm’s constraint is that the sum of the squares of the loadings equals 1.
    III is true because four components can capture all the variation in four variables,
    provided there are at least four data points (note that the problem states that the data set is
    large).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. Determine which of the following indicates that a nonstationary time series can be
    represented as a random walk
    I. A control chart of the series detects a linear trend in time and increasing
    variability.
    II. The differenced series follows a white noise model.
    III. The standard deviation of the original series is greater than the standard
    deviation of the differenced series.
    (A) I only
    (B) II only
    (C) III only
    (D) I, II and III
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: D
    See Page 242 of Regression Modeling with Actuarial and Financial Applications.
    I is true because a random walk is characterized by a linear trend and increasing
    variability.
    II is true because differencing removes the linear trend and stabilizes the variance.
    III is true as both the linear trend and the increasing variability contribute to a higher
    standard deviation.
17
Q
  1. You are given a set of n observations, each with p features.
    Determine which of the following statements is/are true with respect to clustering
    methods.
    I. The n observations can be clustered on the basis of the p features to
    identify subgroups among the observations.
    II. The p features can be clustered on the basis of the n observations to
    identify subgroups among the features.
    III. Clustering is an unsupervised learning method and is often performed as
    part of an exploratory data analysis.
    (A) None
    (B) I and II only
    (C) I and III only
    (D) II and III only
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: E
    I and II are both true because the roles of rows and columns can be reversed in the
    clustering algorithm. (See Section 10.3 of An Introduction to Statistical Learning.)
    III is true. Clustering is unsupervised learning because there is no dependent (target)
    variable. It can be used in exploratory data analysis to learn about relationships between
    observations or features.
18
Q
  1. Determine which of the following statements is/are true about clustering methods:
    I. If K is held constant, K-means clustering will always produce the same
    cluster assignments.
    II. Given a linkage and a dissimilarity measure, hierarchical clustering will
    always produce the same cluster assignments for a specific number of
    clusters.
    III. Given identical data sets, cutting a dendrogram to obtain five clusters
    produces the same cluster assignments as K-means clustering with K = 5.
    (A) I only
    (B) II only
    (C) III only
    (D) I, II and III
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: B
    I is false. K-means clustering is subject to the random initial assignment of clusters.
    II is true. Hierarchical clustering is deterministic, not requiring a random initial
    assignment.
    III is false. The two methods differ in their approaches and hence may not yield the same
    clusters.
19
Q
  1. Determine which of the following statements about hierarchical clustering is/are
    true.
    I. The method may not assign extreme outliers to any cluster.
    II. The resulting dendrogram can be used to obtain different numbers of
    clusters.
    III. The method is not robust to small changes in the data.
    (A) None
    (B) I and II only
    (C) I and III only
    (D) II and III only
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: D
    I is false. All observations are assigned to a cluster.
    II is true. By cutting the dendrogram at different heights, any number of clusters can be
    obtained.
    III is true. Clustering methods have high variance, that is, having a different random
    sample from the population can lead to different clustering
20
Q
  1. Determine which of the following statements about clustering is/are true.
    I. Cutting a dendrogram at a lower height will not decrease the number of
    clusters.
    II. K-means clustering requires plotting the data before determining the
    number of clusters.
    III. For a given number of clusters, hierarchical clustering can sometimes
    yield less accurate results than K-means clustering.
(A) None
(B) I and II only
(C) I and III only
(D) II and III only
(E) The correct answer is not given by (A), (B), (C), or (D)
A
  1. Key: C
    I is true. At the lowest height, each observation is its own cluster. The number of clusters
    decreases as the height increases.
    II is false. There is no need to plot the data to perform K-means clustering.
    III is true. K-means does a fresh analysis for each value of K while for hierarchical
    clustering, reduction in the number of clusters is tied to clusters already made. This can
    miss cases where the clusters are not nested.
21
Q
  1. Determine which of the following statements is NOT true about clustering
    methods.
    (A) Clustering is used to discover structure within a data set.
    (B) Clustering is used to find homogeneous subgroups among the observations within
    a data set.
    (C) Clustering is an unsupervised learning method.
    (D) Clustering is used to reduce the dimensionality of a dataset while retaining
    explanation for a good fraction of the variance.
    (E) In K-means clustering, it is necessary to pre-specify the number of clusters.
A
  1. Key: D

Item D is a statement about principal components analysis, not clustering

22
Q
  1. Determine which of the following statements regarding statistical learning
    methods is/are true.
    I. Methods that are highly interpretable are more likely to be highly flexible.
    II. When inference is the goal, there are clear advantages to using a lasso
    method versus a bagging method.
    III. Using a more flexible method will produce a more accurate prediction
    against unseen data.
    (A) I only
    (B) II only
    (C) III only
    (D) I, II and III
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: B
    I is false. Highly flexible models are harder to interpret. For example, a ninth degree
    polynomial is harder to interpret than a straight line.
    II is true. Inference is easier when using simple and relatively inflexible methods. Lasso is simpler and less flexible than bagging.
    III is false. Flexible methods tend to overfit the training set and be less accurate when applied to unseen data.
23
Q
  1. Determine which of the following statements is/are true about Pearson residuals.
    I. They can be used to calculate a goodness-of-fit statistic.
    II. They can be used to detect if additional variables of interest can be used to
    improve the model specification.
    III. They can be used to identify unusual observations.
    (A) I only
    (B) II only
    (C) III only
    (D) I, II, and III
    (E) The correct answer is not given by (A), (B), (C), or (D).
A
  1. Key: D
    I is true. See Page 348 of Frees.
    II is true. See Page 347 of Frees.
    III is true. See Page 347 of Frees.
24
Q
  1. Determine which of the following statements about prediction is true.

(A) Each of several candidate regression models must produce the same prediction.
(B) When making predictions, it is assumed that the new observation follows the
same model as the one used in the sample.
(C) A point prediction is more reliable than an interval prediction.
(D) A wider prediction interval is more informative than a narrower prediction
interval.
(E) A prediction interval should not contain the single point prediction.

A
  1. Key: B
    A is false because different models are likely to produce different results.
    B is true because using the fitted model implies that this model continues to apply.
    C is false as there is no easy way to compare reliability of the two approaches.
    D is false in that a narrower interval provides more useful information about the true
    value.
    E is false in that the interval contains the most likely values with the point prediction
    being the single most likely point.
25
Q

RESPONSE VS. EXPLANATORY

What is a response variable ?

A

= Dependant variable

is a variable of primary concern to an analyst. Typically, we hope to predict the response variable of a future observation and to see whether the response variable can be understood better using other variables.

26
Q

RESPONSE VS. EXPLANATORY

What is an explanatory variable ?

A

is any variable used to study the response variable. In other words, we aim to discover and exploit potential relationships that exist between the response variable and an explanatory variable

27
Q

Which of the following statements are true about hypothesis testing under a simple linear regression?
I) To find evidence for a negative slope parameter, the null hypothesis is that the slope parameter is less than 0
II) For a given dataset, a 2-tailed t test will have different degrees of freedom compared to a one-tailed t-test
III) For a given dataset and significance level, the positive critical value for a two-tailed t test is the same as the critical values for a right-tailed t test
IV) In failing to reject the null hypothesis that the slope parameter is 0,we conclude that there is a meaninfgul relationshop between the response and explnatory variables.
V) There are situations where a large p-value leads to rejecting the null hypothesis.

A

NONE OF THE STATEMENTS ARE TRUE = (

exemple video 2.2.2 One-tailed t tests