Lecture 3-Intro to ML Flashcards
Why Machine Learning(4)?
-Increase in Data Generation
-Improve Decision Making
-Uncover patterns and trends in data
-Solve complex problems
When was ML created and who did?
1959 Arthur Samuel
What is the ML process if its training data(8)?
1.Dataset
2.Data cleaning
3.Feature Engineering
4.Training data
5.Learning algorithm
6.Train model
7.Score model
8.Evaluate model
What is the ML process if its new data(6)?
1.Dataset
2.Data cleaning
3.Feature Engineering
4.New data
5.Score model
6.Evaluate model
What is the task that takes the most time in ML process?
Data cleaning takes 80-90% of time
What are the 3 types of Machine Learning?
-Supervised learning
-Unsupervised learning
-Reinforcement learning
What is supervised learning?
The machine learns by using labelled data
What is unsupervised learning?
The machine is trained on unlabeled data without any guidance
What is reinforcement learning?
An agent interacts with its environment by producing actions and discovers errors and rewards
What does EDA stand for in supervised and unsupervised learning?
Exploratory Data Analysis
What is ML widely used in?
-In data mining aka Knowledge Discovery Detection(KDD)
Examples: clustering, anomaly detection, association rule mining
What is prior(or unconditional) probability ?
Probability of an event before any evidence is obtained
What is posterior(or conditional) probability?
Probability of an event given that you know that some evidence is true
What is Naive Bayes Classifier?
A simple probabilistic classifier based on Bayes’ theorem where:
-there’s strong independence assumption (often does not hold)
-the features/attributes are conditionally independent
What are 4 pros of Naive Bayes Classification?
-Very effective on real-world tasks
-Used as baseline algo before trying other methods
-Fast, simple
-Gives confidence in its class predictions