Final exam Flashcards

Question 1

Q

Explain the difference between supervised and unsupervised learning.

Answer

A

Supervised learning develops model and trains that model using both input and output data and predicts future outputs. Un-supervised learning interprets data based only on known input and predicts output i.e. this learning finds hidden patterns only in input data and does not need output data.

Question 2

Q

What types of tasks are classified as supervised and what kind of tasks as unsupervised learning?

Answer

A

Supervised : classification and regression
Un-supervised : clustering

Question 3

Q

Explain cross-validation.

Answer

A

Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. For existing machine learning model and some data, it shows if model fits. It splits data into training and test set. Model is being trained with the training set and evaluated the result with test set.

Question 4

Q

Draw a 2x2 confusion matrix and explain what each box means.

Answer

A

Predicted class
Actual class Class=1 Class=0
Class=1 TP FN
Class=0 FP TN

TP – True Positive; TN – True Negative; FP – False Positive; FN – False Negative

NxN matrix is used to evaluate performance of classified model by comparing actual class vs. predicted class. Applicable only to 2-class models: C=[0, 1].

Question 5

Q

State 2 types of Fuzzy logic and explain the difference.

Answer

A

Mamdani and Sugeno. Mamdani uses rules determined by the user to get output distribution, uses fuzzification and defuzzification. Sugeno generates fuzzy rules based on a data set with inputs and outputs and does not use fuzzification and defuzzification.

Question 6

Q

Define sensitivity and specifity in words and using formulas.

Answer

A

Sensitivity= TP/(TP+FN) (TPR)
Specificity= TN/(TN+FP) (FPR)

Sensitivity represents a ratio of values classified correctly as positive and all positive values i.e it represents how good at classifying positive class is; it is also called true positive rate.

Specificity represents a ratio of values classified incorrectly as positive and all negative values i.e it represents how good at classifying negative class is; it is also called false positive rate.

Question 7

Q

Explain the correlation coefficient and in which kind of problems it is used? What performance metrics is also used in such problems?

Answer

A

The correlation coefficient is a statistical measure of the strength of a linear relationship between two variables.
It is calculated as:
***
It is used in regression problems.
Performance metrics which are used are mean-absolute error (MAE), root mean-squared error (RMSE), relative absolute error (RAE), root relative squared error (RRSE).

Question 8

Q

What is the class imbalance problem? Which performance metric is typically used in such cases? How would you prevent this issue?

Answer

A

The class imbalance problem typically occurs when there are many more instances of some classes than others (for example, 500 samples in class 1 and 30 samples in other class). Performance metric is average recall. Either change performance metric or resample.

Question 9

Q

Is ANFIS neural network of Fuzzy system?

Answer

A

ANFIS (Adaptive Neuro-Fuzzy Inference System) is a hybrid intelligent system that combines the capabilities of both neural networks and fuzzy logic. It utilizes fuzzy logic to handle uncertainty and imprecision in data and combines it with the learning capabilities of neural networks to adaptively tune the parameters of the fuzzy inference system.

Question 10

Q

Explain key difference between PSO and GA in terms of principle of operation and terminology.

Answer

A

PSO – Particle Swarm Optimization, GA – Genetic Algorithm.
- PSO does not have genetic operators (crossover, mutation)
- Particles have memory in PSO, there is no selection and removal of population members, while GA does not have memory and there are selection and removal of members;
- Particles survive the whole run (no survival of the fittest) in PSO;
- PSO is more efficient, faster;
- PSO needs more iterations than GA, but fewer number of partcles;
- In GA variables are called genes, in PSO dimensions;
- In PSO topology is constant, while it changes in GA.

Question 11

Q

Explain the difference between feature selection and extraction.

Answer

A

Feature selection is selection of relevant attributes for model construction, ie. It gives a subset of features. Feature extraction is making new attributes from the original data base so there are new features created from functions of original features.

Question 12

Q

List 3 main types of feature selection.

Answer

A

Wrapper, filter, embedded

Question 13

Q

Explain the difference between relevant and redundant.

Answer

A

Relevant refers to information, data, or features that are directly related to the problem or task at hand. Relevant information contributes meaningfully to achieving the objective or understanding the subject matter. In other words, relevant items are essential, valuable, or useful in addressing the specific goal or question being considered.
Redundant, on the other hand, refers to information, data, or features that are unnecessary, repetitive, or duplicative. Redundant items do not provide additional value or contribute meaningfully to the task or objective at hand. They may repeat information already present or be extraneous to the problem being addressed.

Question 14

Q

Why is k-NN termed as lazy learner?

Answer

A

It does not build models explicitly unlike decision trees and rule-based systems. Because it is not learning functions of data base but it is memorizing training data set instead. K-NN is a lazy learner because it does not learn a discriminative function from the training data but “memorizes” the training data set instead.

Question 15

Q

What is the purpose of kernel trick in SVM?

Answer

A

Kernel is opposite from feature selection – it is adding dimensions (going from lower to higher) in order to solve non-linear separable problems. It deals with non-linearity and higher dimension.

Question 16

Q

Image segmentation is an example of which type of learning? What about any price prediction ?

Answer

Study These Flashcards

A

Deep learning.

Question 17

Q

What is the key difference between deep and shallow learning?

Answer

Study These Flashcards

A

Deep learning is NN with many layers and it is using feature engineering techniques. Shallow is NN with 2/3 layers and it requires data base with features. Deep learning does not need a data base with features, it generates features itself.

Question 18

Q

List 2 types of learning methods or approaches.

Answer

Study These Flashcards

A

Model based, instance based

Question 19

Q

What is overfitting and how to prevent it?

Answer

Study These Flashcards

A

Overfitting is an undesirable machine learning behavior that occurs when the machine learning model gives accurate predictions for training data but not for new data. When data scientists use machine learning models for making predictions, they first train the model on a known data set. Then, based on this information, the model tries to predict outcomes for new data sets. An overfit model can give inaccurate predictions and cannot perform well for all types of new data.
1. Hold-out
2. Cross-validation
3. Data augmentation
4. Feature selection
5. L1 / L2 regularization
6. Remove layers / number of units per layer
7. Dropout
8. Early stopping

Question 20

Q

Explain the similarity between PSO and GA in terms of principle of operation and terminology.

Answer

Study These Flashcards

A

Population-based Optimization: Both PSO and GA are population-based optimization algorithms. They work with a population of potential solutions to the optimization problem, rather than just a single solution.
Iterative Improvement: Both algorithms iteratively improve the population of solutions over successive generations or iterations. Each iteration aims to find better solutions by refining the current population.
Stochastic Search: Both PSO and GA employ stochastic elements in their search processes. They use randomness to explore the solution space and avoid getting trapped in local optima.
Fitness Evaluation: Both algorithms use a fitness function to evaluate the quality of candidate solutions within the population. The fitness function provides a measure of how well a particular solution performs with respect to the optimization objective.

Question 21

Q

What is transfer learning ?

Answer

Study These Flashcards

A

Transfer learning is a machine learning technique where a model trained on one task is re-purposed or adapted for a different but related task. In transfer learning, knowledge gained from solving one problem is applied to a different but related problem, typically when the data for the new task is limited or expensive to obtain. Instead of training a model from scratch for the new task, transfer learning leverages the knowledge or representations learned from a source task to improve performance on the target task.

Question 22

Q

List a few simple data (image) augmentation approaches!

Answer

Study These Flashcards

A

Horizontal Flipping
Random Rotation
Random Cropping
Random Scaling
Random Translation
Brightness and Contrast Adjustment
Gaussian Blur
Color Jittering
Adding Noise
Elastic Deformation

Final exam Flashcards

(22 cards)