ML Foundation Set 2 (Answers) Flashcards
Which of the following is NOT supervised learning?
a. PCA
b. Decision Tree
c. Linear Regression
d. Naive Bayesian
a. PCA
Which of the following statements about Naive Bayes is incorrect?
a. Attributes are equally important.
b. Attributes are statistically dependent of one another given the class value.
c. Attributes are statistically independent of one another given the class value.
d. Attributes can be nominal or numeric
b. Attributes are statistically dependent of one another given the class value.
Among the following option identify the one which is not a type of machine learning
a. Semi unsupervised learning
b. Supervised learning
c. Reinforcement learning
d. unsupervised learning
a. Semi unsupervised learning
Identify the kind of learning algorithm for “facial identities and facial expressions”.
a. Prediction
b. Recognise patterns
c. Recognising anomalies
d. Generating patterns
b. Recognise patterns
Identify the model which is trained with data in only a single batch.
a. Offline learning
b. Batch learning
c. Both A and B
d. None
c. Both offline learning and batch learning
What is the application of machine learning methods to a large database called?
a. Big data computing
b. Internet of things
c. Data mining
d. Artificial intelligence
c. Data mining
Identify the type of learning in which labelled training data is used.
a. Clustering
b. Supervised learning
c. Reinforcement learning
d. unsupervised learning
b. Supervised learning
Identify whether true or false: In PCA the number of input dimensions is equal to principal components.
a. True
b. False
a. True
Among the following identify the one in which dimensionality reduction reduces.
a. Performance
b. Entropy
c. Stochastics
d. Collinearity
d. Collinearity
Which of the following machine learning algorithm is based upon the idea of bagging?
a. Decision tree
b. Random tree
c. SVM
d. Regression
b. Random tree
Choose a disadvantage of decision trees:
a. Decision trees are robust to outliers
b. Factor analysis
c. Decision trees are prone to overfit
d. All of the above
c. Decision trees are prone to overfit
What is the term known as on which the machine learning algorithms build a model based on
sample data?
a. Data training
b. Training data
c. Transfer data
d. None of the above
b. Training data
Machine learning is a subset of which of the following.
a. Artificial intelligence
b. Deep learning
c. Data learning
d. None of the above
a. Artificial intelligence
Which of the following machine learning techniques helps in detecting the outliers in data?
a. Classification
b. Clustering
c. Anomaly detection
d. All of the above
c. Anomaly detection
The father of machine learning is _____________
a. Geoffrey Everest Hinton
b. Geoffrey Hill
c. Geoffrey Chaucer
d. None of the above
a. Geoffrey Everest Hinton
The most significant phase in genetic algorithm is _________
a. Mutation
b. Selection
c. Fitness function
d. Crossover
d. Crossover
Which of the following are common classes of problems in machine learning?
a. Regression
b. Classification
c. Clustering
d. All of the above
d. All of the above
Among the following options identify the one which is FALSE regarding regression.
a. It is used for prediction
b. It is used for interpretation
c. It relates inputs to outputs
d. It discovers casual relationships
d. It discovers casual relationships
Identify the successful applications of ML.
a. Learning to classify new astronomical structures
b. Learning to recognize spoken words
c. Learning to drive an autonomous vehicle
d. All of these choices
d. All of these choices
Identify the incorrect numerical functions in the various function representation of machine
learning.
a. Case-based
b. Support vector machines
c. Linear regression
d. Neural network
a. Case-based
FIND-S algorithm ignores?
a. Positive
b. Negative
c. Both
d. None
b. Negative
Neuro software is ______
a. It is software used by neurosurgeons
b. designed to aid experts in real world
c. it is powerful and easy neural network
d. a software used to analyze neurons
c. it is powerful and easy neural network
Choose whether the following statement is true or false: The backpropagation law is also known
as the generalized Delta rule
a. True
b. False
a. True
Choose the general limitations of the backpropagation rule among the following.
a. Slow convergence
b. Scaling
c. Local minima problem
d. All of these choices
d. All of these choices
Analysis of ML algorithm needs
a. Statistical learning theory
b. Computational learning theory
c. Both A and B
d. None of the above
c. Both A and B
Choose the most widely used metrics and tools to assess the classification models.
a. The area under the ROC curve
b. Confusion matrix
c. Cost-sensitive accuracy
d. All of the above
d. All of the above
Identify the difficulties with the k-nearest neighbor algorithm.
a. Curse of dimensionality
b. Calculate the distance of the test case from all training cases
c. Both A and B
d. None of the above
c. Both A and B
Which one of the following is also called as exploratory learning?
a. supervised learning
b. active learning
c. unsupervised learning
d. reinforcement learning
c. unsupervised learning
In which of the following learning the teacher returns reward and punishment to learner?
a. active learning
b. reinforcement learning
c. supervised learning
d. unsupervised learning
b. reinforcement learning
The output of training process in machine learning is __________
a. machine learning model
b. machine learning algorithm
c. null
d. accuracy
a. machine learning model
What does K stand for in K mean algorithm?
a. Number of clusters
b. Number of data
c. Number of attributes
d. Number of iterations
a. Number of clusters
Which of the following is/are true?
1.K-mean algorithm is for clustering of unlabeled data
2. KNN is unsupervised leaning
3. the k in KNN stands for the number of cluster
4. None of the above
a. 1 & 3
b. 1, 2 & 3
c. 1 only
d. 4 only
c. 1 only
Which of the following is NOT a supervised learning?
a. Number of groups is known
b. Features of group explicitly stated
c. Neither features nor number of groups is known
d. none of the above
c. Neither features nor number of groups is known
Which of the following is FALSE about SVM?
a. SVM aims to maximise the separation between the support vectors
b. SVM achieve non-linear separation by reducing the dimension with its kernel
c. Kernel add additional dimension to make linearly non-spearable data separable.
d. Slack variable can be used to control the amount of allowed training errors
b. SVM achieve non-linear separation by reducing the dimension with its kernel
What are the applications of Natural Language Processing (NLP)?
a. spam detection
b. sentiment analysis
c. movie recommendations
d. all the above
d. all the above
Which of the following is NOT an essential step for Deep Learning?
a. Data collection and labelling
b. Feature engineering
c. Model training
d. Model evaluation and fine tuning
b. Feature engineering
Which of the following is NOT a machine learning algorithm?
a. SVM
b. SVG
c. Random Forest
d. None of the above
b. SVG
Which of the following concerning categorical data is true?
a. We can use get_dummies method to convert ordinal data
b. for ordinal data the order of data carries information/weightage (for example primary, secondary,
tertiary etc)
c. gender (eg male and female) is ordinal data
d. we should use method replace with a dictionary to convert nominal data
b. for ordinal data the order of data carries information/weightage (for example primary, secondary,
tertiary etc)
Which of the following statements is FALSE?
a. Underfitting is when a model learning very well about training data but not the test data
b. The model of overfitting tends to be more complex compared to an underfitting model.
c. Underfitting draw a very simple relationship among input features and the output target
d. Overfitting does not perform well on the test data
a. Underfitting is when a model learning very well about training data but not the test data
What is the reason for data preprocessing?
a. Clean up missing data
b. Handling outliers
c. scale data to a suitable range
d. All the above
d. All the above
How do you handle missing or corrupted data in a dataset?
a. Drop missing rows or columns
b. Replace missing values with mean/median/mode
c. Assign a unique category to missing values
d. All of the above
d. All of the above
When performing regression or classification, which of the following is the correct way to preprocess the data?
a. Normalize the data -> PCA -> training
b. PCA -> normalize PCA output -> training
c. Normalize the data -> PCA -> normalize PCA output -> training
d. None of the above
a. Normalize the data -> PCA -> training
Predicting whether a tumour is malignant or benign is an example of?
a. Unsupervised Learning
b. Supervised Regression Problem
c. Supervised Classification Problem
d. Categorical Attribute
c. Supervised Classification Problem
Engineering a good feature space is a crucial ___ for the success of any machine learning model.
a. Pre-requisite
b. Process
c. Objective
d. None of the above
a. Pre-requisite
The transformations applied to the identified data before feeding the same into the algorithm is called:
a. Problem Identification
b. Identification of Required Data
c. Data Pre-processing
d. Definition of Training Data Set
c. Data Pre-processing
Which of the following are not classification problems?
(Choose two)
a. Predicting price of house
b. Predicting patient has tumor
c. Predicting who will hold the title in football league
d. Predicting percentage of student for next semester
a. Predicting price of house
c. Predicting who will hold the title in football league
Out of 200 emails, a classification model correctly predicted 150 spam emails and 30 ham emails.
What is the accuracy of the model?
a. 10%
b. 90%
c. 75%
d. None of the above
b. 90%
Which neural network architecture would be most suited to handle an image identification problem (recognizing a dog in a photo)?
a. Multi Layer Perceptron
b. Convolutional Neural Network
c. Recurrent Neural network
d. Perceptron
b. Convolutional Neural Network
Price prediction in the domain of real estate is an example of?
a. Unsupervised Learning
b. Supervised Regression Problem
c. Supervised Classification Problem
d. Unsupervised regression problem
b. Supervised Regression Problem