SCENARIO VII - Domain V, Competency B Flashcards
1
Q
Amy is a data scientist who works for a health care company in France. She is asked to develop a plan to build a
machine learning model that predicts patient satisfaction with doctors based on an analysis of the doctors’
notes. The doctors are from various countries, and the notes are in free-form text, compiled in a database in
Germany.
A
Please use the following scenario to answer the next TWO questions.
2
Q
- Which of the following is the best answer for what Amy’s first step should be?
A. Clean the data to remove any outliers and missing values.
B. Perform a data lineage analysis to determine where the data originated.
C. Start building a predictive model that can be used to assess client satisfaction.
D. Examine the data for unexpected insights, concepts and semantic relationships.
A
- The correct answer is B. Given the strong privacy and data protection laws in this region, privacy and data
transfer laws must be considered first. EU, French and German laws must be reviewed before any other
activity is performed. This is achieved by reviewing where the data originates, where it is transferred and
what laws control this path, as well as the use of the data. While removing outliers and missing items; or
examining insights, concepts and semantics are vital aspects of the AI development process, laws may
disallow the processing or movement of the data entirely (i.e., from France to Germany).
3
Q
- Assuming Amy has complied with all privacy and data protection laws, what type of machine learning is
Amy most likely to use to predict patient satisfaction?
A. Regression.
B. Decision tree.
C. Dimensionality reduction.
D. Natural language processing.
A
- The correct answer is D. Since the doctors’ notes are in free-form text, natural language processing would
be the most likely type of machine learning used. NLP is specifically used for analyzing text and natural
language. Regression is used typically for predicting numerical values where there are underlying variables
that are numeric. A decision tree is used typically for classification where a limited number of variables are
involved. Dimensionality reduction is used to reduce the number of variables and other extraneous factors,
so is not likely to be used, given the other options.
Body of Knowledge Domain V, Competency B