! Session 1 & 2: Intro Flashcards
Algorithm
set of procedures that creates a model when trained. E.g., linear regression
Model
fitted algorithm that has been trained. E.g. a linear regression model that has been trained to predict prices of ‘X’.
Parameters
internal variables within model that adjust automatically when training model
Hyperparameters
variables set by the user to control the algorithm and they define how it learns from the data
Data Science
= Collection of statistical & ML model that
- supports info extraction from data
- offers insights, causality, predictions
ML
- science of programming computers so they learn from data without being explicitly programmed
- component of data science
- AI technique for sophisticated cognitive task
ML Functions
- Descriptive: uses data to explain what happened
- Predictive: uses data to predict what will happen
- Prescriptive: use data to suggest actions
ML Subfields
- Natural language processing: machines learn to understand natural language as spoken and written by humans
- Neural networks: modeled on the human brain
- Deep learning networks: neural networks with many layers
When does ML work well?
- (Large) data is available
- Problem is dynamic or fluctuating
- Problem requires predictions or discovering patterns
ML Flow
Preperation
- Identify Question & Task
- Data Collection
- Data preprocessing
- EDA
Model Development
- Feature & Model Selection
- Splitting Data
- Training with Train Set, validation
- Evaluate with Test set (repeat model development until results satisfiing)
Communicate results & Deploying Model
Challenges
- Explainability: What are ML models doing? How decisions made?
- Bias & unintended outcomes
Challenge: Bias & unintended outcomes
- insufficient Data (Not enough training, non-representative, irrelevant features)
- Overfitting & underfitting
Bias-Variance Tradeoff
- goal: low bias & low variance
- Bias = amount of error by approximating real-world phenomena with simplified model -> underfit
- Variance = how much models test error changes based on variation in training data, prediction error when using data not previously seen by model-> overfit
Subcategories of ML Models
- Based on Training
- supervised
- unsupervised
- reinforcement
- semi-supervised - Based on working
- instance based
- model based
Subcategories of ML Models - based on training: super and unsupervised
- Supervised = trained with labeled data -> know answers we want
- Unsupervised = looks for pattern in unlabeled data -> goal: find unkonwn structures / trends