SCENARIO VII - Domain V, Competency B Flashcards

1
Q

Amy is a data scientist who works for a health care company in France. She is asked to develop a plan to build a
machine learning model that predicts patient satisfaction with doctors based on an analysis of the doctors’
notes. The doctors are from various countries, and the notes are in free-form text, compiled in a database in
Germany.

A

Please use the following scenario to answer the next TWO questions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Which of the following is the best answer for what Amy’s first step should be?
    A. Clean the data to remove any outliers and missing values.
    B. Perform a data lineage analysis to determine where the data originated.
    C. Start building a predictive model that can be used to assess client satisfaction.
    D. Examine the data for unexpected insights, concepts and semantic relationships.
A
  1. The correct answer is B. Given the strong privacy and data protection laws in this region, privacy and data
    transfer laws must be considered first. EU, French and German laws must be reviewed before any other
    activity is performed. This is achieved by reviewing where the data originates, where it is transferred and
    what laws control this path, as well as the use of the data. While removing outliers and missing items; or
    examining insights, concepts and semantics are vital aspects of the AI development process, laws may
    disallow the processing or movement of the data entirely (i.e., from France to Germany).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Assuming Amy has complied with all privacy and data protection laws, what type of machine learning is
    Amy most likely to use to predict patient satisfaction?
    A. Regression.
    B. Decision tree.
    C. Dimensionality reduction.
    D. Natural language processing.
A
  1. The correct answer is D. Since the doctors’ notes are in free-form text, natural language processing would
    be the most likely type of machine learning used. NLP is specifically used for analyzing text and natural
    language. Regression is used typically for predicting numerical values where there are underlying variables
    that are numeric. A decision tree is used typically for classification where a limited number of variables are
    involved. Dimensionality reduction is used to reduce the number of variables and other extraneous factors,
    so is not likely to be used, given the other options.
    Body of Knowledge Domain V, Competency B
How well did you know this?
1
Not at all
2
3
4
5
Perfectly