Lecture 2 - Introduction to Machine Learning Flashcards
What is Machine Learning?
The study of algorithms that can learn from data and can make predictions on new data.
What are the primary differences between traditional programming versus machine learning programming?
** Traditional Programming **
Can be seen as having one stage
Data -> Program -> Output (Output being the focus)
** Machine Learning **
Can be seen as having two stages
1. Training
Data -> Algorithm -> Output (Algorithm being the focus)
- Deployment
Data -> Algorithm -> Output (Output being the focus)
What is an example of a timeline of machine learning?
Data Collection -> Preprocessing -> Exploratory Data Analysis -> Model Building -> Splitting the Data -> Training and Validation -> Testing the model -> Deploying the model
What is CRISP-DM?
CRISP-DM is an acronym for Cross-industry standard process for data mining
It is a commonly used methodology for data mining
It divides data mining into six phases starting from business understanding
What are the six stages of CRISP-DM?
They are iterative processes (That are also non-linear), but the general stage structure looks like this
Business Understanding Data Understanding -> Data Preparation Modelling -> Evaluation -> Deployment
What is supervised machine learning?
Supervised machine learning is when data includes labels
What is the goal of supervised machine learning?
Use predictor variables to predict a target variable
What are some examples of supervised machine learning algorithms?
- linear regression, logistic regression
- decision tree, random forest
- support vector machines (SVM)
- neural network; generative models (GANs)
What is classification?
Classification is putting something into a category (Species of flowers, spam or not spam, will a customer click on an ad)
What is regression?
Regression is predicting a continuous value
stock prices, temperatures, sales volume, price of a house etc
What are some alternative names for features?
Features = Predictor Variables = Independant variables
What are some alternative names for target variable?
Target Variable = Dependant Variable = Response Variable
What is unsupervised machine learning?
Unsupervised machine learning is when data does not include labels
What are some examples of unsupervised machine leraning algorithms?
- Clustering **
- K-means clustering
- Hierarchical
- Dimensionality Reduction **
- Principal Components Analysis (PCA)
- t-SNE
What is the primary goal of clustering?
You have a data set and the algorithm is grouping the data into multiple groups which it assumes are similar
What is reinforcement learning? What are some example use cases?
- an agent observes the state of the environment, takes actions and gets rewards
- agent learns by itself to maximize reward
Examples of use cases: Self-driving cars, Gaming (AI), Robotics
What are variable types? What three variable types have we primarily worked with?
Categorical (Also nominal)
- Set of values without order
- Examples, Gender (M,F), hair color (black, brown, red)
Ordinal
- Set of ordered values
- Magnitude between successive values not known
- Examples, Clothing Size (XS, S, M, L, XL)
Continuous (Also numeric)
- Integer or real values
- Examples, Temperature, year
Out of Categorical, Ordinal and Continuous, which two variable types are discrete?
Categorical and Ordinal