Data Science Flashcards

Question 1

Q

What is the most common approach to transform categorical variables into numeric ones?
Select one:
a. Frequency count
b. One-hot encoding
d. “Baysean” encoders
e. Contrast encoders

Answer

A

b. One-hot encoding

Question 2

Q

In what situation should observations with anomalies in values (outliers) be removed?
Select one:
a. When outliers represent more than 90% of the existing number of observations
b. When outliers deform models more than they can help, as is the case with models built with
numerical algorithms, such as K-Means or K-Medoids
c. When outliers represent more than 75% of the existing number of observations
d. When the algorithms used do not deal well with outliers, as is the case with algorithms based on
decision trees

Answer

A

b. When outliers deform models more than they can help, as is the case with models built with
numerical algorithms, such as K-Means or K-Medoids

Question 3

Q

What is the primary use of Data Visualization in Data Science?
a. To build machine learning models
b. To clean the data
c. To help understand and interpret the data better by representing it in a graphical format
d. To produce attractive graphics for publications

Answer

A

c. To help understand and interpret the data better by representing it in a graphical format

Question 4

Q

When evaluating a descriptive analytics model using CRISP-DM, which of the following aspects is
often considered?
a. The quality of insights extracted and their relevance to the business problem
b. The predictive power of the model
d. The speed of the model in production
e. The accuracy of the clustering algorithms

Answer

A

a. The quality of insights extracted and their relevance to the business problem

Question 5

Q

Which of these tasks is NOT part of the Business Understanding phase of CRISP-DM?
Select one:
a. Produce the project plan
c. Defining business objectives
d. Data exploration
e. Determine Data Mining objectives
f. Assess the situation

Answer

A

d. Data exploration

Question 6

Q

Which of the following does NOT represent a valid reason for a dataset to be normalized?
Select one:
a. When the target is between 0 and 1
b. When one wants to interpret the coefficients of linear models
d. Improve an algorithm’s training speed by helping to accelerate convergence, even in algorithms that
do not require data to be normalized
e. When using algorithms that measure the distance (e.g., Euclidean distance) between data points

Answer

A

a. When the target is between 0 and 1

Question 7

Q

What is SQL often used for in the context of data science?
a. To manipulate and retrieve data stored in relational databases
b. To create interactive user interfaces
c. To design the output of descriptive models
e. To create 3D graphics

Answer

A

a. To manipulate and retrieve data stored in relational databases

Question 8

Q

Which of the following is NOT a characteristic of CRISP-DM?
Select one:
a. Used by academia and by practitioners
c. Follows the waterfall project management standards defined by the Project Management Institute
d. It is independent of the tools used
e. Can be applied in different type of problems
f. Non-Proprietary (not patented or owned by an organization)

Answer

A

c. Follows the waterfall project management standards defined by the Project Management Institute

Question 9

Q

Select the correct type of data.
a. {name: “Maria”, age: 32, city: “London”};
b. Image file
c. A column in a dataset with the product brand (rows are transactions)
d. A column in a dataset with the customer total purchase amount, in Euros (rows are based
customers’ behavior and characteristics)

Structured data – Numerical
Semi-structured data - Json
Structured Data - Categorical
Unstructured data - Multimedia

Answer

A

a. Semi-structured data - Json
b. Unstructured data - Multimedia
c. Structured Data - Categorical
d. Structured data – Numerical

Question 10

Q

A histogram chart is helpful to…
a. Analyze variations over time
b. Make comparisons between two variables
d. Analyze the distribution of data points
e. Analyze the relationship between two variables

Answer

A

d. Analyze the distribution of data points

Question 11

Q

What is the main goal of the ‘Deployment’ phase in the CRISP-DM methodology?
a. To recollect the data for analysis
c. To evaluate the success of data cleaning
d. To explain the results of exploratory data analysis
e. To push the developed model into a production or operational environment where it can provide the intended business benefits

Answer

A

e. To push the developed model into a production or operational environment where it can provide the intended business benefits

Question 12

Q

Which chart should be used to visualize the distribution between three numeric variables?

Answer

A

3D area chart

Question 13

Q

In an RFM segmentation where segments are created using quartiles, typically, “Good customers, but almost churned” (do not make purchases for some time, but buy frequently and spend the most) are situated in which segment?
Select one:
a. 344
b. 444
c. 144
e. 411

Question 14

Q

How does Cosine Similarity differ from Euclidean distance?
a. Cosine Similarity can only be used when all data points are positive
b. Cosine Similarity can only be calculated in two dimensions
c. Cosine Similarity is used only with unstructured data.
e. Cosine Similarity takes into account the direction of data (the angle between data point vectors), and not the magnitude, making it useful when the dataset includes text or other high-dimensional data

Answer

A

e. Cosine Similarity takes into account the direction of data (the angle between data point vectors), and not the magnitude, making it useful when the dataset includes text or other high-dimensional data

Question 15

Q

What is the line signed inside the red oval represented in the following boxplot?
Select one
a. The quartile 3 value plus the value of the interquartile range multiplied by 1.5
b. Quartile 2
c. The quartile 1 value minus the value of the interquartile range multiplied by 1.5
d. Quartile 3

Answer

A

a. The quartile 3 value plus the value of the interquartile range multiplied by 1.5

Question 16

Q

Which of the following are typical applications of Market Basket Analysis? Select one or more:
a. Improve product placement in stores
b. Identifying cross-selling opportunities
c. Segment customers into groups with similar characteristics
d. Identifying up-selling opportunities

Answer

Study These Flashcards

A

a. Improve product placement in stores
b. Identifying cross-selling opportunities
d. Identifying up-selling opportunities

Question 17

Q

What is the primary use of Data Science in Marketing?
b. To understand and predict customer behaviors and improve marketing strategies
c. To entertain customers
d. To develop high-tech products
e. To retain talented employees

Answer

Study These Flashcards

A

b. To understand and predict customer behaviors and improve marketing strategies

Question 18

Q

Which of the following statements describes what is the data preparation phase (Data
Preparation) in CRISP-DM?
Select one:
a. Phase that covers all the activities necessary to build the final dataset (data that will be used in >/ the
modeling tool(s)) from the initial raw data
c. Phase in which various modeling techniques are selected and applied. The parameters of the various techniques are also calibrated to ideal values
d. Phase that focuses on understanding the objectives and requirements of the project from a business perspective, converting this knowledge into the definition of the Data Mining problem and the
development of the preliminary plan to achieve the objectives
e. Phase that begins with the initial data extraction and continues with activities that allow familiarization with the data, identifying data quality problems, discovering the first information about the data and / or detecting interesting subsets to form hypotheses about hidden information

Answer

Study These Flashcards

A

a. Phase that covers all the activities necessary to build the final dataset (data that will be used in >/ the
modeling tool(s)) from the initial raw data

Data Science Flashcards

(18 cards)