DS1 Flashcards
An insurer’s business need for data obtained through statistical plans is mainly concerned with
Select one:
A. The cost and pricing of insurance.
B. Market conduct.
C. Regulatory requirements.
D. Financial solvency.
A. The cost and pricing of insurance.
Consider the following tests regarding the ordering of the predictive value of
1. Personality tests
2. Reference checks
3. Cognitive tests
in determining a prospective employee’s job performance as discussed in chapter six “Ineligible to Serve” of O’Neil’s book Weapons of Math Destruction. What rank ordering is described?
Select one:
A. 1 < 3 <2
B. 3 <1 <2
C. 1 < 2 <3
D. 3 <2 < 1
C. 1 < 2 <3
A data set consists of heights for third-, fourth-, and fifth-grade students. The fitted value for each grade is the mean height. Which one of the following approaches best describes how to evaluate whether grade distributions differ only in location?
Select one:
A. Plot the fitted heights for one grade on the vertical scale and the fitted heights for all grades on the horizontal scale to evaluate the pattern of residuals.
B. Plot the quantiles of the residuals for one grade on the vertical scale and the fitted heights for all grades on the horizontal scale to evaluate the pattern of residuals.
C. Plot the quantiles of the residuals for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.
D. Plot the fitted heights for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.
C. Plot the quantiles of the residuals for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.
An independent state bureau is examining data in a unit statistical plan (USP). The USP indicates the underwriting experience (premiums and losses) collected for which type of insurance?
Select one:
A. General liability
B. Workers compensation
C. Surety
D. Personal auto
B. Workers compensation
Which one of the following statements, according to Loukides et al., cannot be described as five C’s?
I. Agreement about what data is being collected and how that data will be used
II. Users must have clarity about what data they are providing, what is going to be done with the data, and any downstream consequences of how their data is used
III. An organization that exposes user data can do so if they have best intentions
IV. It’s often impossible to reduce the amount of data collected, or to have data deleted later
Select one:
A. I
B. II
C. III
D. IV
C. III
Which statement about geoms in ggplot is true?
Select one:
A. The default geom smoothing method is loess.
B. Mapping colors can be continuous or categorical.
C. An object with an alpha of 1 will be completely transparent.
D. You must add points to your plot before adding a smoothed line.
B. Mapping colors can be continuous or categorical.
Let x(1), …x(n) represent data set A, ordered from smallest to largest. Let y(1), …y(n) represent data set B, ordered from smallest to largest. A q-q plot is constructed to compare distributions A and B. Distribution A is assigned to the horizontal axis and distribution B is assigned to the vertical axis. The points graphed on the panel consistently lie below the line A = B. C is a constant. Which one of the following relationships is most likely true?
Select one:
A. A - B = C
B. A + B = C
C. B - C = A
D. B - A = C
A. A - B = C
Which one of the following is a level of the CRISP-DM hierarchical process model?
Select one:
A. Process Instances
B. Data understanding
C. Assess Situation
D. Tool and Technique
A. Process Instances
In Healy, there is a case study related to a slide created to evaluate Marissa Mayer’s performance as CEO of Yahoo. What is the biggest problem noted about this slide in the text?
Select one:
A. Dual axes can be scaled to misrepresent the association in the variables.
B. For this topic it is not appropriate to have time on the x axis.
C. The color theme doesn’t align with best practices for preattentive processing.
D. The overall message of the slide is unclear.
D. The overall message of the slide is unclear.
Blind tests help increase reliability by helping to prevent which effect?
Select one:
A. Experimental Design Effect
B. Hawthorne Effect
C. Piper Effect
D. Vision Effect
B. Hawthorne Effect
When using a power transformation to adjust non-normal variables, the effectiveness of the transformation depends on the selection of an appropriate T parameter. Which one of the following statements regarding the selection of the T parameter is true?
Select one:
A. If the ratio of the largest observation to the smallest observation of a data set is very close to 1, power transformations with T from -1 to 1 have a large effect.
B. For data sets with zeroes, a power transformation with a T parameter less than zero will be most effective.
C. A trial and error method can be used to identify the value of the T parameter that will be most effective.
D. For a highly skewed distribution, a power transformation with the parameter T = 1 will be most effective.
C. A trial and error method can be used to identify the value of the T parameter that will be most effective.
Robust estimation techniques are valuable for visualizing non-normal data. To assess whether the residuals for different groups of a non-normal data set may be pooled, distributions of the spread-standardized residuals may be graphed by normal q-q plots. Which one of the following descriptions correctly defines the spread-standardized residual?
Select one:
A. The difference between a transformed observation and its group median, divided by its group standard deviation
B. The difference between a transformed observation and its group median
C. The difference between a transformed observation and its group mean, divided by its group standard deviation
D. The difference between a transformed observation and its group median, divided by its group mean absolute deviation
D. The difference between a transformed observation and its group median, divided by its group mean absolute deviation
A data set of observations quantifying mobile phone battery life is skewed toward large values. It is most likely that
Select one:
A. The values on a quantile plot are symmetric.
B. The quantile plot displays a convex pattern.
C. The distribution is well-approximated by the normal distribution.
D. The median and the mean measure the same aspect of the distribution.
B. The quantile plot displays a convex pattern.
Quantiles are essential to visualizing distributions. Which one of the following statements is true of quantiles?
Select one:
A. The precise form of fi is important.
B. A fraction f of the data is greater than q(f).
C. No explicit rule is needed to compute q(f).
D. The f-values provide a standard for comparison.
D. The f-values provide a standard for comparison.
The Review Process includes which one of the following activities?
Select one:
A. Rank results with respect to business success criteria.
B. Identify misleading steps.
C. Determine deployment strategy.
D. Select best model.
B. Identify misleading steps.