AI+ML+DS Flashcards

- Deep Learning, Goodfellow - Artificial Intelligence, Norvig - Introduction to Statistical Learning in Python, Hastie

1
Q

Knowledge Based Approach to AI

A

Hard-coding knowledge about the world in formal languages for the computer system to make logical inference rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Machine Learning

A

Subset of artificial intelligence that enables computers to learn from data and improve their performance on a specific task without being explicitly programmed. Instead of being hand-coded with specific rules, machine learning algorithms can identify patterns and make predictions based on the data they are trained on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Artificial Intelligence

A

A broad field of computer science that aims to create intelligent agents, which are systems that can reason, learn, and act autonomously.

In simpler terms, AI involves developing machines that can think and behave like humans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Deep Learning

A

A subset of machine learning that uses artificial neural networks with multiple layers to learn from data. These neural networks are inspired by the structure and function of the human brain, and they can learn complex patterns and relationships in data that are difficult for traditional machine learning algorithms to capture.

Solves a central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Representation of Data

A

Refers to the way data is structured and encoded so that it can be processed by machine learning algorithms. The choice of data representation can significantly impact the performance and efficiency of a machine learning model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Feature

A

Each piece of information included in the representation of an observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Representation Learning

A

Use of machine learning to discover not only the mapping from representation to output but also the representation itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Autoencoder

A

Quintessential example of representation learning

Combination of an encoder function, which converts the input data into a different representation, and a decoder function, which converts the new representation back into the original format

Trained to preserve as much information as possible when an input is run through the encoder and then the decoder, but also trained to make the new representation have various nice properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Encoder

A

Converts the input data into a different representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decoder

A

Converts the new representation (encoded) back into the original format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Factors of Variation

A

Concepts of abstractions that help us make sense of the rich variability in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multilayer Perceptron (MLP)

A

A type of artificial neural network that consists of multiple layers of interconnected neurons. Each neuron takes a weighted sum of its inputs, applies an activation function, and passes the result to the next layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Input/Visible Layer

A

First layer of neural network that contains the variables we are able to observe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hidden Layer

A

Layers of a neural network that are not the first (input) or last (output) layers of the network. Extracts increasingly abstract features from the data. Their values are not given in the data; instead, the model must determine which concepts are useful for explaining the relationships in the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Adaptive Linear Element (ADALINE)

A

A type of single-layer artificial neural network used for linear regression and classification tasks. It is similar to the perceptron but uses a least mean squares (LMS) algorithm for training, which allows it to learn more efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Rectified Linear Unit (ReLu)

A

A popular activation function used in artificial neural networks. It introduces non-linearity into the model, allowing it to learn complex patterns.

How does it work?
- If the input (x) is positive, the output is the input itself.
- If the input is negative, the output is zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Linear Algebra

A

Branch of mathematics that deals with the study of vectors, matrices, and linear transformations. It provides a framework for solving systems of linear equations and analyzing the properties of linear relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Scalar

A

A single number

Written in italics with lowercase variable names

Can be thought of as a matrix with a single entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Vector

A

An array of ordered scalars = 1-D
(each number has a specific location in the array)

Written in lowercase names with bold typeface

Can be thought of as matrices that contain only one column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Matrix

A

2-D array of scalars

Written in uppercase names with bold typeface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Tensor

A

N-D array of scalars

Written in uppercase names with bold-tensor typeface

bold-tensor typeface is slightly different than our traditional bold typeface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Matrix Operation: Transpose

A

Taking the mirror image of a matrix across the main diagonal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Main Diagonal

A

Diagonal line on a matrix running down to the right, starting from its upper left corner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Broadcasting

A

The implicit copying of a scalar to many locations when performing a matrix operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Turing Test

A

A test of a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. It was proposed by Alan Turing in 1950.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Control Theory

A

Deals with designing devices that act optimally on the basis of feedback from the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

4 Approaches to AI

A
  1. Acting Humanly: The Turing Test approach
  2. Thinking Humanly: The cognitive modeling approach
  3. Thinking Rationally: The “laws of thought” approach
  4. Acting Rationally: the rational agent approach
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Agent

A

Anything that can be viewed as perceiving its environment and acting upon that environment.

Perceives through sensors
Acts through actuators

Agent = architecture + program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Rational Agent

A

An Agent that acts so as to achieve the best outcome, or, when there is uncertainty, the best expected outcome

Rational != Perfect

Rationality maximizes expected performance, while perfection maximizes actual performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Standard Model

A

General framework for representing and analyzing systems

The focus on the study and construction of agents that “do the right thing”

What is the “right thing”? -> Defined by the standard model

Control Theory: controller minimizes a cost function
Operations Research: policy maximizes a sum of rewards
Statistics: Decision rule minimizes a loss function
Economics: Where a decision maker maximizes utility or some measure of social welfare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Value Alignment Problem

A

The values or objectives put into the machine must be aligned with those of the human

Behaviors are not “unintelligent” or “insane”; they are a logical consequence of defining winning as the sole objective for a machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Algorithm

A

A step-by-step set of instructions or rules to be followed to solve a specific problem or achieve a particular outcome. It’s like a recipe for a computer, providing a precise sequence of actions to perform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Incompleteness Theorem

A

States that any consistent formal system that is powerful enough to express basic arithmetic statements is necessarily incomplete. In other words, there will always be true statements within the system that cannot be proven or disproven using the axioms and rules of inference within that system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Computability

A

Capable of being computed by an effective procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Tractability

A

Refers to the property of a problem that can be solved by an algorithm in a reasonable amount of time. In other words, a tractable problem is one that can be efficiently solved using a computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Intractable

A

Time required to solve instances of the problem grows exponentially with the size of the instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

NP-Completeness

A

A concept in theoretical computer science that refers to a class of decision problems that are considered to be among the most difficult to solve. These problems are characterized by the fact that while it is relatively easy to verify a solution, it is extremely difficult to find a solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Decision Theory

A

A field of study that deals with making rational choices in the face of uncertainty. It provides a framework for analyzing and making decisions when there are multiple possible outcomes and associated probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Game Theory

A

A branch of mathematics and economics that studies strategic decision-making among rational agents. It analyzes situations where the outcome for one agent depends on the choices made by other agents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Multiagent Systems

A

Systems composed of multiple autonomous agents that interact with each other and their environment to achieve a common goal. These agents can be software programs, robots, or even humans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Operations Research

A

A field of study that applies mathematical and analytical techniques to solve complex problems that arise in business, industry, and other organizations. It focuses on optimizing systems, processes, and decisions to improve efficiency and effectiveness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Singularity

A

A hypothetical future point in time when technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization. It is often associated with the development of artificial intelligence (AI) that surpasses human intelligence in all aspects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Hebbian Learning

A

A principle in neuroscience that states that neurons that fire together wire together. This means that if two neurons are frequently activated simultaneously, the connection between them is strengthened. Conversely, if two neurons are rarely activated simultaneously, the connection between them weakens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Matrix Operation: Dot Product

A

C=AB

If A is of shape m x n and B is of shape n x p, then C is of shape m x p

A must have the same number of columns as B has rows

A = (2x3) B = (3x2) C = (2x2)

A = [[1, 2, 3],
[4, 5, 6]]

B = [[7, 8],
[9, 10],
[11, 12]]

Dot product of A and B = C:
C = [[17+29+311, 18+210+312],
[47+59+611, 48+510+612]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Matrix Operation: Element-wise Product

A

Matrix A and Matrix B must be of same shape. Then you multiply the corresponding elements in each matrix for result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Matrix Operation: Scalar Product

A

Multiplying every value of a matrix by a scalar constant

Resulting matrix is the same shape and output is original matrix with each element multiplied by constant scalar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Matrix Notation: Rows and Columns

A

m x n

m = # of rows
n = # of columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

System of Linear Equations

A

A collection of two or more linear equations with the same variables. Each equation represents a straight line on a graph, and the solution to the system is the point(s) where these lines intersect.

A set of two or more equations with the same variables. The goal is to find values for the variables that satisfy all of the equations simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Identity Matrix

A

I

Matrix that does not change any vector when we multiple that vector by that matrix

The structure of the identify matrix is simple: all the entries along the main diagonal are 1, while all other entries are zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Matrix Operation: Inversion

A

Produces the Inverse Matrix

The inverted matrix is another matrix that, when multiplied by our original matrix, produces the identity matrix.

A * B = B * A = I

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Linear Algebra: Origin

A

The point specified by the vector of all zeros

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Linear Combination of vectors

A

Produced by multiplying each vector, in a list of vectors (matrix), by a corresponding scalar coefficient and then adding the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Matrix: Span

A

The set of all points obtainable by linear combination of the original vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Column Span/ Range

A

Refers to the set of all linear combinations of the columns of a matrix. It is a subspace of the vector space containing the columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Linear Dependence

A

Refers to a relationship between a set of vectors where one vector can be expressed as a linear combination of the others.

In other words, if you can find a set of non-zero scalars (coefficients) such that a linear combination of the vectors equals the zero vector, then the vectors are linearly dependent.

If you can add 2 columns together, after first multiplying them by a scalar constant, to get another column - those columns are linearly dependent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Linear Independence

A

True of a set of vectors if no vector in the set is a linear combination of the other vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Matrix Attribute: Square

A

m = n and all columns are linear independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Square Matrix Attribute: Singular

A

A square matrix with linear independent columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Matrix: Norm

A

Tools used to measure the size of vectors

Functions mapping vectors to non-negative values

Measures the distance from the origin to the point x

60
Q

Lp norm

A

Generalization of L2 and L1 norm.

L1 norm (p = 1)
L2 norm (p = 2)
Linf norm (p = inf)

61
Q

L2 Norm

A

aka Euclidean Norm

Measures the ordinary Euclidean distance from the origin

62
Q

L1 Norm

A

aka Manhattan Norm

Measures the sum of the absolute values of the components of a vector

Used in ML applications when the difference between zero and nonzero elements is very important

63
Q

L-inf Norm

A

aka Maximum Norm

Measures the maximum absolute value of components of a vector

64
Q

Diagonal Matrix

A

Consists mostly of zeros and have nonzero entries only along the main diagonal

Ex. Identity Matrix

65
Q

Symmetric Matrix

A

Any matrix that is equal to its own transpose

Often arise when the entries are generated by some function of two arguments that does not depend on the order of the arguments

66
Q

Unit Vector

A

A vector with unit norm

67
Q

Unit Norm

A

Refers to a vector whose norm is equal to 1. Vector of length 1.

68
Q

Matrix: Orthogonal

A

In linear algebra refers to two vectors that are perpendicular to each other. In other words, if the dot product of two vectors is zero, they are orthogonal.

Orthogonal matrixes are of interest because their inverse is very cheap to compute

69
Q

Matrix: Orthonormal

A

Vectors that are not only orthogonal but also have unit norm

70
Q

Eigendecomposition

A

One of the most widely used kinds of matrix decomposition

We decompose a matrix into a set of eigenvectors and eigenvalues

71
Q

Eigenvalue

A

A fundamental concept in linear algebra that represent the scaling factor associated with a vector when a linear transformation is applied to it.

In other words, an eigenvalue tells you how much a vector is stretched or shrunk when it is multiplied by a matrix.

72
Q

Eigenvector

A

Vectors that remain unchanged in direction when a linear transformation is applied to them.

In other words, when a matrix is multiplied by an eigenvector, the result is a scalar multiple of the original eigenvector.

73
Q

Positive Definite

A

A matrix whose eigenvalues are all positive

74
Q

Positive Semidefinite

A

A matrix whose eigenvalues are all positive or zero

75
Q

Negative Definite

A

A matrix whose eigenvalues are all negative

76
Q

Negative Semidefinite

A

A matrix whose eigenvalues are all negative or zero

77
Q

Singular Value Decomposition (SVD)

A

Provides another way to factorize (decompose) a matrix, into singular vectors and singular values

Enables us to discover some of the same kind of information as the eigendecomposition reveals but the SVD is more generally applicable

Every real matrix has a SVD, but that is not true for the eigenvalue decomposition

78
Q

Singular Vectors

A

Closely related to singular values and are obtained from the same decomposition. They provide information about the principal directions of the data represented by a matrix.

79
Q

Singular Values

A

A fundamental concept in linear algebra that provide information about the scaling and rotation properties of a matrix. They are closely related to eigenvalues and eigenvectors.

80
Q

Moore-Penrose Pseudoinverse

A

Generalization of the inverse matrix for rectangular or non-invertible square matrices.

81
Q

Trace Operator

A

Gives the sum of all the diagonal entries of a matrix

82
Q

Determinant

A

Scalar value associated with a square matrix

Function that maps matrices to real values. Equal to the product of all eigenvalues of the matrix.

Absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space

If determinant=0, space is contracted completely along at least one dimension

If determinant=1, then the transformation preserves volume

83
Q

Agent Function

A

Describes the agent’s behavior - its mapping from percepts to actions

Policy

84
Q

Agent Program

A

Concrete implementation the Agent Function.

85
Q

In general, an agent’s choice of action at any given instant can depend on its built-in knowledge and on the entire percept sequence observed to date, but not on anything it hasn’t perceived

A

TODO

86
Q

Performance Measure

A

Captures the notion of desirability by evaluating given sequences of environmental states

Return?

87
Q

As a general rule, it is better to design performance measures according to what one actually wants to be achieved in the environment, rather than according to how one thinks the agent should behave

A

TODO

88
Q

Omniscience

A

The ability to know, for certain, the actual outcome of actions and then act accordingly to achieve goal.

89
Q

Q: Compare and contrast information gathering and exploration actions

A

Information Gathering actions are actions taken to acquire predefined information while explorative actions are taken to discover new/unknown information

90
Q

Information Gathering Actions

A

Taking actions in order to modify future, predefined, percepts

Ex. Looking both ways before you cross the street

91
Q

Explorative Actions

A

Actions taken to discover new ideas, concepts, or possibilities with a focus to broaden understanding.

92
Q

Task Environment: Components

A

PEAS

  1. Performance
  2. Environment
  3. Actuators
  4. Sensors
93
Q

Task Environment: Properties

A
  1. Observability: Fully vs. Partial
  2. Agent: Single vs Multi
    Mulit-agent environments can be competitive or cooperative as well
  3. State-Transition: Deterministic vs Nondeterministic
  4. Interaction: Episodic vs Sequential
  5. Stationarity: Static vs Dynamic
  6. State/Action Space: Discrete vs Continuous
  7. Model: Known vs Unknown
94
Q

Q: Difference between Stochastic and Nondeterministic

A

Stochastic = specified probabilities
(“25% chance of rain tomorrow”)

vs

Nondeterministic = non-specified probabilities
(“chance of rain tomorrow”)

95
Q

Fully- vs Partially-Observable

A

Fully-Observable = if an agent’s sensors given it access to the complete state of the environment at each point in time

Partially-Observable = if the sensors only give partial access

Unobservable = sensors are unable to see any of the environment

96
Q

Single vs Multi Agent Environment

A

The key distinction is whether object B’s behavior is best described as maximizing a performance measure whose value depends on Agent A’s behavior?

If yes, Multi Agent
If no, Single Agent

Multi Agent can be competitive or cooperative

97
Q

Deterministic vs Nondeterministic

A

Deterministic = the next state of the environment is completely determined by the current state and the action executed by the agent(s)

Nondeterministic = otherwise

98
Q

Episodic vs Sequential

A

Episodic = Agent’s experience is divided into atomic episodes with 1 time-step each

Sequential = The current decision could affect all future decisions

Checking Defective Parts = Episodic
Chess = Sequential

99
Q

Stationary (Static) vs. Nonstationary (Dynamic)

A

Nonstationary = environment can change while agent is operating within it

Stationary = otherwise

100
Q

Discrete vs Continuous

A

Describes handling/datatypes of the environment/state, time, and actions of a task.

101
Q

Known vs. Unknown

A

Known = Agent has a full transition model of the environment where the outcome of every action is KNOWN

Unknown = otherwise

102
Q

Agent Architecture

A

Computing device with physical sensors and actuators that agent program will on

103
Q

Agent Program: Types

A
  1. Simple Reflex
  2. Model-Based Reflex
  3. Goal-Based
  4. Utility-Based
104
Q

Simple Reflex Agent

A

Agent that selects actions on the basis of the current percept only, ignoring the rest of the percept history.

Operates via condition-action rules

105
Q

Model-Based Reflex Agent

A

Agent that maintains a transition model and sensor model

106
Q

Goal-Based Agent

A

Agents that utilize a goal describing situations that are desirable - like being at a particular destination

Agent that has a reward signal of 1 only at the end of the episode.

107
Q

Utility-Based Agent

A

Agents that utilize a goal but also a utility function to understand if one state is more desirable than another state on the way to achieving the goal

Value Function

108
Q

Condition-Action Rule

A

if A then B

aka
situation-action rule
production
if-then rule

109
Q

Transition Model

A

Models “how the world (environment) works”

110
Q

Sensor Model

A

Models how the environment is reflected by the agent’s percepts of it

111
Q

Learning Agent: Components

A
  1. Learning Element
  2. Performance Element
  3. Critic
  4. Problem Generator
112
Q

Learning Element

A

Element responsible for making improvements to the agent

Uses feedback from the critic on how the agent is doing and determines how the performance element should be modified to do better in the future

113
Q

Performance Element

A

Element responsible for selecting external actions.

Takes percepts and decides on actions

114
Q

Critic

A

Determines how well the agent is doing

115
Q

Problem Generator

A

Responsible for suggesting actions that will lead to new and informative experiences.

Responsible for exploration

116
Q

Agent Components: Representation

A
  1. Atomic
  2. Factored
  3. Structured
  4. Distributed Representation
117
Q

Atomic Representation

A

Each state of the world is indivisible - it has no internal structure

118
Q

Factored Representation

A

Splits each state up into a fixed set of variables/attributes, each of which can have a value.

119
Q

Structured Representation

A

Considers states but also the relationships between them.

Underlies relational databases and first-order logic, first-probability models, and much of natural language

120
Q

Statistical Learning

A

Vast set of tools for understanding data. These tools can be supervised or unsupervised.

121
Q

Supervised Learning

A

Involves building a statistical model for predicting, or estimating, an output based on one or more inputs

122
Q

Unsupervised Learning

A

There are inputs but no supervising output (like supervised learning); nevertheless, we can learn relationships and structure from such data.

123
Q

Regression

A

Predicting a continuous/quantitative output value

Supervised

Problems with a quantitative response

124
Q

Classification

A

Predicting a categorical/qualitative output value

Supervised

Problems with a qualitative response

125
Q

Clustering

A

Grouping elements together according to their observed characteristics

Unsupervised

126
Q

The Four ISL Premises

A
  1. Many statistical learning methods are relevant and useful in a wide range of academic and non-academic disciplines, beyond just the statistical sciences
  2. Statistical learning should not be viewed as a series of black boxes
  3. While it is important to know what job is performed by each cog, it is not necessary to have the skills to construct the machine inside the box
  4. It is presumed that the reader is interested in applying statistical learning methods to real-world problems
127
Q

Input Variables

A

aka predictors, independent variables, features, variables.

Notation: X

128
Q

Output Variables

A

aka response, dependent variable

Notation: Y

129
Q

General Form of Relationship

A

Y = f(X) + epsilon

130
Q

Relationship: epsilon

A

random error term

Nonsystematic Information

Independent of X and has mean zero

131
Q

Relationship: f(X)

A

Systematic Information

132
Q

Q: Why estimate f?

A
  1. Prediction
  2. Inference
133
Q

Reducible Error

A

Error introduced because of the chosen model.

Reducible because choosing a different model could yield more or less error if model is a better representation of the true relationship.

134
Q

Irreducible Error

A

Even if our model was perfect, there would still be error because Y is a function of f(X), our model, AND epsilon, our nonsystematic error term.

Because of epsilon, error is always present in our model and is irreducible.

135
Q

Parametric Models

A

Method where the problem of estimating our model is reduced to estimating a set of parameters.

Involve a two-step model-based approach:
1. We make an assumption about the functional form, or shape, of our model (for example linear for a linear model)
2. Select a procedure that uses our training data to fit/train our selected model.

Ex. (1) Linear Regression + (2) Ordinary Least Squares

136
Q

Nonparametric Models

A

Methods that do not make explicit assumptions about the functional form our model will take

Instead, they seek an estimate that is as close to the data points while still being generalizable (not overfit)

Because there is no functional form assumption, a greater amount of data is required in order to obtain an accurate estimate for f.

137
Q

Overfitting

A

Artifact of model and training process

If the model is too flexible, it could learn the noise/errors of the data and not the overall relationship and therefore not be a generalized solution

138
Q

Trade-Off Between Prediction Accuracy and Model Interpretability

A

The more accurate the model, the less interpretable it is because it is usually nonparametric rather than parametric.

139
Q

Semi-Supervised Learning

A

In this method, a model is trained on a dataset that contains a small amount of labeled data and a large amount of unlabeled data.

Essentially using a model to learn on a small batch of labeled data and then using said model to label the unlabeled data.

140
Q

How can we measure how well model predictions actually match the observed data?

How can we measure the quality of fit?

A

Fundamentally, we need to measure the distance between unseen data (test data) and our model’s predictions on said data.

Regression:
- MSE = mean squared error
-

Classification:
- Error Rate
-

141
Q

Mean Squared Error (MSE)

A

1/n*sum( (y_true-y_pred)^2 )

142
Q

Classification: Error Rate

A

1/n*sum( I(y_true!=y_pred) )

Computes the fraction of incorrect classifications

We want this value to be minimized

Where I(y_true!=y_pred) = indicator variable and takes the form of 0 or 1 depending on y_true!=y_pred result.

I = 1 if y_true!=y_pred (misclassed)
I = 0 if y_true =y_pred (correct)

143
Q

Bias1

A

Model Driven Error

There is a true relationship in the data that we are trying to model and every model we use will not model that relationship perfectly, but some will model it better than others.

If we have a nonlinear relationship in our data and try to build a model using linear regression, it will not model it very well and have high bias.

We could use a nonparametric model and estimate the relationship much better.

144
Q

Variance

A

Data Driven Error

Variance refers to the amount by which our model would change if we estimated it using a different training data set.

Fundamentally, if we use different training data sets, we will generate a different model - and each of these models should be similar given similar training data

High Variance = small changes in training data result in large changes to our model

145
Q

Bias-Variance Trade-Off

A

Bias = Model Driven Error
Variance = Data Driven Error

A model that estimates the relationship very well (low bias) will be more susceptible to small changes in the data (high variance)

While models that estimate the relationship more generally (high bias) will be less susceptible to small changes in the data (low variance)

As a general rule, as we use more flexible methods, the variance will increase, and the bias will decrease.

146
Q
A
146
Q
A