DS1 Flashcards
An insurer’s business need for data obtained through statistical plans is mainly concerned with
Select one:
A. The cost and pricing of insurance.
B. Market conduct.
C. Regulatory requirements.
D. Financial solvency.
A. The cost and pricing of insurance.
A data set consists of heights for third-, fourth-, and fifth-grade students. The fitted value for each grade is the mean height. Which one of the following approaches best describes how to evaluate whether grade distributions differ only in location?
Select one:
A. Plot the fitted heights for one grade on the vertical scale and the fitted heights for all grades on the horizontal scale to evaluate the pattern of residuals.
B. Plot the quantiles of the residuals for one grade on the vertical scale and the fitted heights for all grades on the horizontal scale to evaluate the pattern of residuals.
C. Plot the quantiles of the residuals for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.
D. Plot the fitted heights for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.
C. Plot the quantiles of the residuals for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.
An independent state bureau is examining data in a unit statistical plan (USP). The USP indicates the underwriting experience (premiums and losses) collected for which type of insurance?
Select one:
A. General liability
B. Workers compensation
C. Surety
D. Personal auto
B. Workers compensation
Which statement about geoms in ggplot is true?
Select one:
A. The default geom smoothing method is loess.
B. Mapping colors can be continuous or categorical.
C. An object with an alpha of 1 will be completely transparent.
D. You must add points to your plot before adding a smoothed line.
B. Mapping colors can be continuous or categorical.
In Healy, there is a case study related to a slide created to evaluate Marissa Mayer’s performance as CEO of Yahoo. What is the biggest problem noted about this slide in the text?
Select one:
A. Dual axes can be scaled to misrepresent the association in the variables.
B. For this topic it is not appropriate to have time on the x axis.
C. The color theme doesn’t align with best practices for preattentive processing.
D. The overall message of the slide is unclear.
D. The overall message of the slide is unclear.
When using a power transformation to adjust non-normal variables, the effectiveness of the transformation depends on the selection of an appropriate T parameter. Which one of the following statements regarding the selection of the T parameter is true?
Select one:
A. If the ratio of the largest observation to the smallest observation of a data set is very close to 1, power transformations with T from -1 to 1 have a large effect.
B. For data sets with zeroes, a power transformation with a T parameter less than zero will be most effective.
C. A trial and error method can be used to identify the value of the T parameter that will be most effective.
D. For a highly skewed distribution, a power transformation with the parameter T = 1 will be most effective.
C. A trial and error method can be used to identify the value of the T parameter that will be most effective.
Robust estimation techniques are valuable for visualizing non-normal data. To assess whether the residuals for different groups of a non-normal data set may be pooled, distributions of the spread-standardized residuals may be graphed by normal q-q plots. Which one of the following descriptions correctly defines the spread-standardized residual?
Select one:
A. The difference between a transformed observation and its group median, divided by its group standard deviation
B. The difference between a transformed observation and its group median
C. The difference between a transformed observation and its group mean, divided by its group standard deviation
D. The difference between a transformed observation and its group median, divided by its group mean absolute deviation
D. The difference between a transformed observation and its group median, divided by its group mean absolute deviation
A data set of observations quantifying mobile phone battery life is skewed toward large values. It is most likely that
Select one:
A. The values on a quantile plot are symmetric.
B. The quantile plot displays a convex pattern.
C. The distribution is well-approximated by the normal distribution.
D. The median and the mean measure the same aspect of the distribution.
B. The quantile plot displays a convex pattern.
Quantiles are essential to visualizing distributions. Which one of the following statements is true of quantiles?
Select one:
A. The precise form of fi is important.
B. A fraction f of the data is greater than q(f).
C. No explicit rule is needed to compute q(f).
D. The f-values provide a standard for comparison.
D. The f-values provide a standard for comparison.
The Review Process includes which one of the following activities?
Select one:
A. Rank results with respect to business success criteria.
B. Identify misleading steps.
C. Determine deployment strategy.
D. Select best model.
B. Identify misleading steps.
Which one of the following categories of data quality measures how well data represents true values and the business information being analyzed?
Select one:
A. Accuracy
B. Reasonability
C. Validity
D. Timeliness
A. Accuracy
Which one of the following is true in regard to using analytic tools to identify atypical values for a particular variable?
Select one:
A. A common formula for standardizing a variable is to subtract the mean and multiply by the standard deviation.
B. Variance and standard deviation are scale dependent and increase as the scale of a variable increases without the relative variability increasing.
C. The rule of thumb that values greater or less than three standard deviations from the mean is particularly applicable for heavy-tailed insurance data.
D. An unusually narrow range (given the number of values) or few extreme minimum or maximum values will suggest the presence of outliers.
B. Variance and standard deviation are scale dependent and increase as the scale of a variable increases without the relative variability increasing.
A stakeholder analysis is undertaken by an insurer’s data governance committee because
Select one:
A. Data is received on different bases and broken down by several variables.
B. Various departments have similar demands for types and formats of collected data.by several variables.
C. Stakeholders come to a consensus on their expectations of how data should be handled.
D. Mergers with legacy systems are considered essential to users of insurance data.
A. Data is received on different bases and broken down by several variables.
Web scraping transforms
Select one:
A. Small amounts of data from the internet.
B. Structured data into relational databases.
C. Unstructured data into structured data.
D. Internet data into a library.
C. Unstructured data into structured data.
The 5 C’s are
Consent
Clarity
Consistency
Control
Consequences