- science of programming computers so they learn from data without being explicitly programmed - component of data science - AI technique for sophisticated cognitive task

- Descriptive: uses data to explain what happened - Predictive: uses data to predict what will happen - Prescriptive: use data to suggest actions

- Natural language processing: machines learn to understand natural language as spoken and written by humans - Neural networks: modeled on the human brain - Deep learning networks: neural networks with many layers

- Explainability: What are ML models doing? How decisions made? - Bias & unintended outcomes

! Session 1 & 2: Intro Flashcards by Linda Caro

Algorithm

set of procedures that creates a model when trained. E.g., linear regression

How well did you know this?

Not at all

Perfectly

Model

fitted algorithm that has been trained. E.g. a linear regression model that has been trained to predict prices of ‘X’.

How well did you know this?

Not at all

Perfectly

Parameters

internal variables within model that adjust automatically when training model

How well did you know this?

Not at all

Perfectly

Hyperparameters

variables set by the user to control the algorithm and they define how it learns from the data

How well did you know this?

Not at all

Perfectly

Data Science

= Collection of statistical & ML model that
- supports info extraction from data
- offers insights, causality, predictions

How well did you know this?

Not at all

Perfectly

science of programming computers so they learn from data without being explicitly programmed
component of data science
AI technique for sophisticated cognitive task

How well did you know this?

Not at all

Perfectly

ML Functions

Descriptive: uses data to explain what happened
Predictive: uses data to predict what will happen
Prescriptive: use data to suggest actions

How well did you know this?

Not at all

Perfectly

ML Subfields

Natural language processing: machines learn to understand natural language as spoken and written by humans
Neural networks: modeled on the human brain
Deep learning networks: neural networks with many layers

How well did you know this?

Not at all

Perfectly

When does ML work well?

(Large) data is available
Problem is dynamic or fluctuating
Problem requires predictions or discovering patterns

How well did you know this?

Not at all

Perfectly

ML Flow

Preperation
- Identify Question & Task
- Data Collection
- Data preprocessing
- EDA

Model Development
- Feature & Model Selection
- Splitting Data
- Training with Train Set, validation
- Evaluate with Test set (repeat model development until results satisfiing)

Communicate results & Deploying Model

How well did you know this?

Not at all

Perfectly

Challenges

Explainability: What are ML models doing? How decisions made?
Bias & unintended outcomes

How well did you know this?

Not at all

Perfectly

Challenge: Bias & unintended outcomes

insufficient Data (Not enough training, non-representative, irrelevant features)
Overfitting & underfitting

How well did you know this?

Not at all

Perfectly

Bias-Variance Tradeoff

goal: low bias & low variance
Bias = amount of error by approximating real-world phenomena with simplified model -> underfit
Variance = how much models test error changes based on variation in training data, prediction error when using data not previously seen by model-> overfit

How well did you know this?

Not at all

Perfectly

Subcategories of ML Models

Based on Training
- supervised
- unsupervised
- reinforcement
- semi-supervised
Based on working
- instance based
- model based

How well did you know this?

Not at all

Perfectly

Subcategories of ML Models - based on training: super and unsupervised

Supervised = trained with labeled data -> know answers we want
Unsupervised = looks for pattern in unlabeled data -> goal: find unkonwn structures / trends

How well did you know this?

Not at all

Perfectly

Subcategories of ML Models - based on working

Study These Flashcards

Instance based = learns all previous data, compare new data to it & generalizes based on similarity
model based = learns model based on training data & predict labels according to it

Supervised ML Models

Study These Flashcards

Regression: x -> continous y
Classification: x -> discrete (binary) y

Supervised ML Models - Regression

Study These Flashcards

Linear Regression
Neural Networks

Unsupervised ML Models - Categories

Study These Flashcards

Clustering: x -> discrete y
Dimensionality Reduction: x -> continous y
Density Reduction: select relevant variables

ML versus traditional AI techniques

Study These Flashcards

trad. AI: static, rule based, no generalization
ML: dynamic, data driven, generalization

e.g. Chess

Study These Flashcards

Symbolic AI: sit down with best chess player & put knowledge in PC
Statistical AI: Simulate all possible moves & outcomes & take most likely to win
ML: Show millions of examples & let program learn

Where ML > other AI techniques

Study These Flashcards

tasks programmers cant describe (handwriting, cognitive reasoning)
complex multidimensional problems that cant be solved by numerical reasoning (weather forecast, health care outcomes)

3 C’s of ML

Study These Flashcards

Collaborative filtering: technique for recommendations, same algorithm for different objects, e.g. Amazon & Netflix
Classification
Clustering

Over and underfitting

Study These Flashcards

Overfitting: learning a function that perfectly explains training data that the model learned from, but doesn’t generalize well - high variance
Underfitting: (strong correlated features) model is not complex enough to capture underlying trend - high bias

Reinforcement learning

trains through try & error with reward system, e.g. Roboter learns to walk (difficult to define task, learning dangeorus e.g. car)

semi-supervised learning

Learns with few labels clustering & adopts labels to all instances of cluster

Parametic

Any algorithm that learns using a pre-defined mapped function e.g. linear regression

Non parametric

Any algorithm that does not make assumptions about form of mapping function e.g. KNN & SVM

Pro and con parametric

- pro: simple, fast, less data - con: constrained to parameters / assumptions, limited complexity, poor fit

Pro and con non parametric

- pro: flexibility (large nr of features), high power, performance - con: more data, slower, overfitting

Convex

A shape where given any 2 points in subset, the subset contains the whole line segment that joins them

! Session 1 & 2: Intro Flashcards

(31 cards)