Lecture 3-Intro to ML Flashcards

Question 1

Q

Why Machine Learning(4)?

Answer

A

-Increase in Data Generation
-Improve Decision Making
-Uncover patterns and trends in data
-Solve complex problems

Question 2

Q

When was ML created and who did?

Answer

A

1959 Arthur Samuel

Question 3

Q

What is the ML process if its training data(8)?

Answer

A

1.Dataset
2.Data cleaning
3.Feature Engineering
4.Training data
5.Learning algorithm
6.Train model
7.Score model
8.Evaluate model

Question 4

Q

What is the ML process if its new data(6)?

Answer

A

1.Dataset
2.Data cleaning
3.Feature Engineering
4.New data
5.Score model
6.Evaluate model

Question 5

Q

What is the task that takes the most time in ML process?

Answer

A

Data cleaning takes 80-90% of time

Question 6

Q

What are the 3 types of Machine Learning?

Answer

A

-Supervised learning
-Unsupervised learning
-Reinforcement learning

Question 7

Q

What is supervised learning?

Answer

A

The machine learns by using labelled data

Question 8

Q

What is unsupervised learning?

Answer

A

The machine is trained on unlabeled data without any guidance

Question 9

Q

What is reinforcement learning?

Answer

A

An agent interacts with its environment by producing actions and discovers errors and rewards

Question 10

Q

What does EDA stand for in supervised and unsupervised learning?

Answer

A

Exploratory Data Analysis

Question 11

Q

What is ML widely used in?

Answer

A

-In data mining aka Knowledge Discovery Detection(KDD)
Examples: clustering, anomaly detection, association rule mining

Question 12

Q

What is prior(or unconditional) probability ?

Answer

A

Probability of an event before any evidence is obtained

Question 13

Q

What is posterior(or conditional) probability?

Answer

A

Probability of an event given that you know that some evidence is true

Question 14

Q

What is Naive Bayes Classifier?

Answer

A

A simple probabilistic classifier based on Bayes’ theorem where:
-there’s strong independence assumption (often does not hold)
-the features/attributes are conditionally independent

Question 15

Q

What are 4 pros of Naive Bayes Classification?

Answer

A

-Very effective on real-world tasks
-Used as baseline algo before trying other methods
-Fast, simple
-Gives confidence in its class predictions

Question 16

Q

What is the main con in Naive Bayes Classification?

Answer

A

-Makes a strong assumption of conditional independence that is often INCORRECT

Question 17

Q

How do we evaluate a learning model/what you learned is correct?

Answer

A

You run your classifier on a data set of unseen examples(that you did not use for training) for which you know the correct classification

Question 18

Q

What are the 3 sub-sets we can divide the data set into?

Answer

A

1.Actual training set(~80%)
2.Validating set(~20%)
3.Test set(~80%)

Question 19

Q

What are the metrics used when evaluating a learning model?

Answer

A

-Accuracy
-Recall
-Precision
-F-measure

Question 20

Q

What is the def of accuracy?

Answer

A

-% of instances of the test set the algo correctly classifies
-How many % were correct overall?

Question 21

Q

What is the definition of recall?

Answer

A

How many % of instances of C were found correctly?

Question 22

Q

What is the def of precision?

Answer

A

Of the detected instance of C, how many % were correct?

Question 23

Q

When to use accuracy?

Answer

A

When all classes are equally important and represented

Question 24

Q

When to use recall, precision & f-measure?

Answer

A

When one class is more important than the others