How Google Does Machine Learning Flashcards
Which of the following are best practices for data quality management?
- Preventing duplicates
- All options are correct.
- Automating data entry
- Resolving missing values
- All options are correct.
Which of the following are categories of data quality tools?
- Monitoring tools
- Neither option is correct
- Both Cleaning tools and Monitoring tools
- Cleaning tools
- Both Cleaning tools and Monitoring tools
Which of the following is not a Data Quality attribute?
- Accuracy
- Redundancy
- Auditability
- Consistency
- Redundancy
What are the features of low data quality?
- Incomplete data
- All options are correct
- Duplicated data
- Unreliable info
- All options are correct
Which of the following refers to the Orderliness of data?
- The data entered has the required format and structure
- None of the options are correct.
- The data represents reality within a reasonable period
- The data record with specific details appears only once in the database
- The data entered has the required format and structure
Which of the following machine learning models have labels, or in other words, the correct answers to whatever it is that we want to learn to predict?
- Unsupervised Model
- Supervised Model
- None of the options are correct
- Reinforcement Model
- Supervised Model
Which statement is true?
- Depending on the problem you are trying to solve, the data you have, explainability, etc. will not determine which machine learning methods you use to find a solution.
- Depending on the problem you are trying to solve, the data you have, explainability, etc. will determine which machine learning methods you use to find a solution.
- None of the options are correct.
- Determining which machine learning methods you use to find a solution depends only on the problem or hypothesis.
- Depending on the problem you are trying to solve, the data you have, explainability, etc. will determine which machine learning methods you use to find a solution.
What is a type of Supervised machine learning model?
- Regression model.
- Classification model.
- None of the options are correct.
- Regression models & Classification models
- Regression models & Classification models
Which model would you use if your problem required a discrete number of values or classes?
- Regression Model
- Unsupervised Model
- Classification Model
- Supervised Model
- Classification Model
When the data isn’t labelled, what is an alternative way of predicting the output?
- Clustering Algorithms
- Linear regression
- None of the options are correct.
- Logistic regression
- Clustering Algorithms
Which of the following is not true about Exploratory Data Analysis?
- Does not provide insight into the data.
- Deals with unknowns.
- Discovers new knowledge.
- Generates a posteriori hypothesis.
- Does not provide insight into the data.
What are the objectives of exploratory data analysis?
- Uncover a parsimonious model, one which explains the data with a minimum number of predictor variables.
- All options are correct.
- Gain maximum insight into the data set and its underlying structure.
- Check for missing data and other mistakes
- All options are correct.
Exploratory Data Analysis is majorly performed using the following methods:
- Both Univariate and Bivariate
- None of the options are correct.
- Bivariate
- Univariate
- Both Univariate and Bivariate
Which of the following is not a component of Exploratory Data Analysis?
- Statistical Analysis and Clustering
- Hyperparameter tuning
- Anomaly Detection
- Accounting and Summarizing
- Hyperparameter tuning
Which is the correct sequence of steps in data analysis and data visualisation of Exploratory Data Analysis?
- Data Exploration -> Model Building -> Present Results -> Data Cleaning
- Data Exploration -> Model Building -> Data Cleaning -> Present Results
- Data Exploration -> Data Cleaning -> Model Building -> Present Results
- Data Exploration -> Data Cleaning -> Present Results -> Model Building
- Data Exploration -> Data Cleaning -> Model Building -> Present Results
To predict the continuous value of our label, which of the following algorithm is used?
- Classification
- Unsupervised
- None of the options are correct.
- Regression
- Regression
We can minimize the error between our predicted continuous value and the label’s continuous value using which model?
- Regression
- Both regression and classification
- None of the options are correct.
- Classification
- Regression
What is the most essential metric a regression model uses?
- Mean squared error as their loss function
- Both Mean squared error as their loss function & cross entropy
- None of the options are correct.
- Cross entropy
- Mean squared error as their loss function
If we want to minimize the error or misclassification between our predicted class and the labels class, which of the following models can be used?
- Regression
- Classification
- None of the options are correct.
- Categorical
- Classification