! Session 1 & 2: Intro Flashcards

1
Q

Algorithm

A

set of procedures that creates a model when trained. E.g., linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Model

A

fitted algorithm that has been trained. E.g. a linear regression model that has been trained to predict prices of ‘X’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Parameters

A

internal variables within model that adjust automatically when training model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hyperparameters

A

variables set by the user to control the algorithm and they define how it learns from the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Science

A

= Collection of statistical & ML model that
- supports info extraction from data
- offers insights, causality, predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ML

A
  • science of programming computers so they learn from data without being explicitly programmed
  • component of data science
  • AI technique for sophisticated cognitive task
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ML Functions

A
  • Descriptive: uses data to explain what happened
  • Predictive: uses data to predict what will happen
  • Prescriptive: use data to suggest actions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ML Subfields

A
  • Natural language processing: machines learn to understand natural language as spoken and written by humans
  • Neural networks: modeled on the human brain
  • Deep learning networks: neural networks with many layers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When does ML work well?

A
  • (Large) data is available
  • Problem is dynamic or fluctuating
  • Problem requires predictions or discovering patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ML Flow

A

Preperation
- Identify Question & Task
- Data Collection
- Data preprocessing
- EDA

Model Development
- Feature & Model Selection
- Splitting Data
- Training with Train Set, validation
- Evaluate with Test set (repeat model development until results satisfiing)

Communicate results & Deploying Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Challenges

A
  • Explainability: What are ML models doing? How decisions made?
  • Bias & unintended outcomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Challenge: Bias & unintended outcomes

A
  • insufficient Data (Not enough training, non-representative, irrelevant features)
  • Overfitting & underfitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bias-Variance Tradeoff

A
  • goal: low bias & low variance
  • Bias = amount of error by approximating real-world phenomena with simplified model -> underfit
  • Variance = how much models test error changes based on variation in training data, prediction error when using data not previously seen by model-> overfit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Subcategories of ML Models

A
  1. Based on Training
    - supervised
    - unsupervised
    - reinforcement
    - semi-supervised
  2. Based on working
    - instance based
    - model based
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Subcategories of ML Models - based on training: super and unsupervised

A
  • Supervised = trained with labeled data -> know answers we want
  • Unsupervised = looks for pattern in unlabeled data -> goal: find unkonwn structures / trends
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Subcategories of ML Models - based on working

A
  • Instance based = learns all previous data, compare new data to it & generalizes based on similarity
  • model based = learns model based on training data & predict labels according to it
17
Q

Supervised ML Models

A
  • Regression: x -> continous y
  • Classification: x -> discrete (binary) y
18
Q

Supervised ML Models - Regression

A
  • Linear Regression
  • Neural Networks
19
Q

Unsupervised ML Models - Categories

A
  • Clustering: x -> discrete y
  • Dimensionality Reduction: x -> continous y
  • Density Reduction: select relevant variables
20
Q

ML versus traditional AI techniques

A
  • trad. AI: static, rule based, no generalization
  • ML: dynamic, data driven, generalization
21
Q

e.g. Chess

A
  • Symbolic AI: sit down with best chess player & put knowledge in PC
  • Statistical AI: Simulate all possible moves & outcomes & take most likely to win
  • ML: Show millions of examples & let program learn
22
Q

Where ML > other AI techniques

A
  • tasks programmers cant describe (handwriting, cognitive reasoning)
  • complex multidimensional problems that cant be solved by numerical reasoning (weather forecast, health care outcomes)
23
Q

3 C’s of ML

A
  • Collaborative filtering: technique for recommendations, same algorithm for different objects, e.g. Amazon & Netflix
  • Classification
  • Clustering
24
Q

Over and underfitting

A
  • Overfitting: learning a function that perfectly explains training data that the model learned from, but doesn’t generalize well - high variance
  • Underfitting: (strong correlated features) model is not complex enough to capture underlying trend - high bias
25
Q

Reinforcement learning

A

trains through try & error with reward system, e.g. Roboter learns to walk (difficult to define task, learning dangeorus e.g. car)

26
Q

semi-supervised learning

A

Learns with few labels clustering & adopts labels to all instances of cluster

27
Q

Parametic

A

Any algorithm that learns using a pre-defined mapped function e.g. linear regression

28
Q

Non parametric

A

Any algorithm that does not make assumptions about form of mapping function e.g. KNN & SVM

29
Q

Pro and con parametric

A
  • pro: simple, fast, less data
  • con: constrained to parameters / assumptions, limited complexity, poor fit
30
Q

Pro and con non parametric

A
  • pro: flexibility (large nr of features), high power, performance
  • con: more data, slower, overfitting
31
Q

Convex

A

A shape where given any 2 points in subset, the subset contains the whole line segment that joins them