AI+ML+DS Flashcards

- Deep Learning, Goodfellow - Artificial Intelligence, Norvig - Introduction to Statistical Learning in Python, Hastie

1
Q

Knowledge Based Approach to AI

A

Hard-coding knowledge about the world in formal languages for the computer system to make logical inference rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Machine Learning

A

Subset of artificial intelligence that enables computers to learn from data and improve their performance on a specific task without being explicitly programmed. Instead of being hand-coded with specific rules, machine learning algorithms can identify patterns and make predictions based on the data they are trained on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Artificial Intelligence

A

A broad field of computer science that aims to create intelligent agents, which are systems that can reason, learn, and act autonomously.

In simpler terms, AI involves developing machines that can think and behave like humans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Deep Learning

A

A subset of machine learning that uses artificial neural networks with multiple layers to learn from data. These neural networks are inspired by the structure and function of the human brain, and they can learn complex patterns and relationships in data that are difficult for traditional machine learning algorithms to capture.

Solves a central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Representation of Data

A

Refers to the way data is structured and encoded so that it can be processed by machine learning algorithms. The choice of data representation can significantly impact the performance and efficiency of a machine learning model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Feature

A

Each piece of information included in the representation of an observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Representation Learning

A

Use of machine learning to discover not only the mapping from representation to output but also the representation itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Autoencoder

A

Quintessential example of representation learning

Combination of an encoder function, which converts the input data into a different representation, and a decoder function, which converts the new representation back into the original format

Trained to preserve as much information as possible when an input is run through the encoder and then the decoder, but also trained to make the new representation have various nice properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Encoder

A

Converts the input data into a different representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decoder

A

Converts the new representation (encoded) back into the original format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Factors of Variation

A

Concepts of abstractions that help us make sense of the rich variability in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multilayer Perceptron (MLP)

A

A type of artificial neural network that consists of multiple layers of interconnected neurons. Each neuron takes a weighted sum of its inputs, applies an activation function, and passes the result to the next layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Input/Visible Layer

A

First layer of neural network that contains the variables we are able to observe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hidden Layer

A

Layers of a neural network that are not the first (input) or last (output) layers of the network. Extracts increasingly abstract features from the data. Their values are not given in the data; instead, the model must determine which concepts are useful for explaining the relationships in the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Adaptive Linear Element (ADALINE)

A

A type of single-layer artificial neural network used for linear regression and classification tasks. It is similar to the perceptron but uses a least mean squares (LMS) algorithm for training, which allows it to learn more efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Rectified Linear Unit (ReLu)

A

A popular activation function used in artificial neural networks. It introduces non-linearity into the model, allowing it to learn complex patterns.

How does it work?
- If the input (x) is positive, the output is the input itself.
- If the input is negative, the output is zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Linear Algebra

A

Branch of mathematics that deals with the study of vectors, matrices, and linear transformations. It provides a framework for solving systems of linear equations and analyzing the properties of linear relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Scalar

A

A single number

Written in italics with lowercase variable names

Can be thought of as a matrix with a single entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Vector

A

An array of ordered scalars = 1-D
(each number has a specific location in the array)

Written in lowercase names with bold typeface

Can be thought of as matrices that contain only one column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Matrix

A

2-D array of scalars

Written in uppercase names with bold typeface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Tensor

A

N-D array of scalars

Written in uppercase names with bold-tensor typeface

bold-tensor typeface is slightly different than our traditional bold typeface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Matrix Operation: Transpose

A

Taking the mirror image of a matrix across the main diagonal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Main Diagonal

A

Diagonal line on a matrix running down to the right, starting from its upper left corner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Broadcasting

A

The implicit copying of a scalar to many locations when performing a matrix operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Turing Test
A test of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human. It was proposed by Alan Turing in 1950.
26
Control Theory
Deals with designing devices that act optimally on the basis of feedback from the environment.
27
4 Approaches to AI
1. Acting Humanly: The Turing Test approach 2. Thinking Humanly: The cognitive modeling approach 3. Thinking Rationally: The "laws of thought" approach 4. Acting Rationally: the rational agent approach
28
Agent
Anything that can be viewed as perceiving its environment and acting upon that environment. Perceives through sensors Acts through actuators Agent = architecture + program
29
Rational Agent
An Agent that acts so as to achieve the best outcome, or, when there is uncertainty, the best expected outcome Rational != Perfect Rationality maximizes expected performance, while perfection maximizes actual performance
30
Standard Model
General framework for representing and analyzing systems The focus on the study and construction of agents that "do the right thing" What is the "right thing"? -> Defined by the standard model Control Theory: controller minimizes a cost function Operations Research: policy maximizes a sum of rewards Statistics: Decision rule minimizes a loss function Economics: Where a decision maker maximizes utility or some measure of social welfare
31
Value Alignment Problem
The values or objectives put into the machine must be aligned with those of the human Behaviors are not "unintelligent" or "insane"; they are a logical consequence of defining winning as the sole objective for a machine
32
Algorithm
A step-by-step set of instructions or rules to be followed to solve a specific problem or achieve a particular outcome. It's like a recipe for a computer, providing a precise sequence of actions to perform
33
Incompleteness Theorem
States that any consistent formal system that is powerful enough to express basic arithmetic statements is necessarily incomplete. In other words, there will always be true statements within the system that cannot be proven or disproven using the axioms and rules of inference within that system.
34
Computability
Capable of being computed by an effective procedure
35
Tractability
Refers to the property of a problem that can be solved by an algorithm in a reasonable amount of time. In other words, a tractable problem is one that can be efficiently solved using a computer.
36
Intractable
Time required to solve instances of the problem grows exponentially with the size of the instances
37
NP-Completeness
A concept in theoretical computer science that refers to a class of decision problems that are considered to be among the most difficult to solve. These problems are characterized by the fact that while it is relatively easy to verify a solution, it is extremely difficult to find a solution.
38
Decision Theory
A field of study that deals with making rational choices in the face of uncertainty. It provides a framework for analyzing and making decisions when there are multiple possible outcomes and associated probabilities.
39
Game Theory
A branch of mathematics and economics that studies strategic decision-making among rational agents. It analyzes situations where the outcome for one agent depends on the choices made by other agents.
40
Multiagent Systems
Systems composed of multiple autonomous agents that interact with each other and their environment to achieve a common goal. These agents can be software programs, robots, or even humans.
41
Operations Research
A field of study that applies mathematical and analytical techniques to solve complex problems that arise in business, industry, and other organizations. It focuses on optimizing systems, processes, and decisions to improve efficiency and effectiveness.
42
Singularity
A hypothetical future point in time when technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization. It is often associated with the development of artificial intelligence (AI) that surpasses human intelligence in all aspects.
43
Hebbian Learning
A principle in neuroscience that states that neurons that fire together wire together. This means that if two neurons are frequently activated simultaneously, the connection between them is strengthened. Conversely, if two neurons are rarely activated simultaneously, the connection between them weakens.
44
Matrix Operation: Dot Product
C=AB If A is of shape m x n and B is of shape n x p, then C is of shape m x p A must have the same number of columns as B has rows A = (2x3) B = (3x2) C = (2x2) A = [[1, 2, 3], [4, 5, 6]] B = [[7, 8], [9, 10], [11, 12]] Dot product of A and B = C: C = [[1*7+2*9+3*11, 1*8+2*10+3*12], [4*7+5*9+6*11, 4*8+5*10+6*12]]
45
Matrix Operation: Element-wise Product
Matrix A and Matrix B must be of same shape. Then you multiply the corresponding elements in each matrix for result.
46
Matrix Operation: Scalar Product
Multiplying every value of a matrix by a scalar constant Resulting matrix is the same shape and output is original matrix with each element multiplied by constant scalar
47
Matrix Notation: Rows and Columns
m x n m = # of rows n = # of columns
48
System of Linear Equations
A collection of two or more linear equations with the same variables. Each equation represents a straight line on a graph, and the solution to the system is the point(s) where these lines intersect. A set of two or more equations with the same variables. The goal is to find values for the variables that satisfy all of the equations simultaneously.
49
Identity Matrix
I Matrix that does not change any vector when we multiple that vector by that matrix The structure of the identify matrix is simple: all the entries along the main diagonal are 1, while all other entries are zero
50
Matrix Operation: Inversion
Produces the Inverse Matrix The inverted matrix is another matrix that, when multiplied by our original matrix, produces the identity matrix. A * B = B * A = I
51
Linear Algebra: Origin
The point specified by the vector of all zeros
52
Linear Combination of vectors
Produced by multiplying each vector, in a list of vectors (matrix), by a corresponding scalar coefficient and then adding the results.
53
Matrix: Span
The set of all points obtainable by linear combination of the original vectors
54
Column Span/ Range
Refers to the set of all linear combinations of the columns of a matrix. It is a subspace of the vector space containing the columns.
55
Linear Dependence
Refers to a relationship between a set of vectors where one vector can be expressed as a linear combination of the others. In other words, if you can find a set of non-zero scalars (coefficients) such that a linear combination of the vectors equals the zero vector, then the vectors are linearly dependent. If you can add 2 columns together, after first multiplying them by a scalar constant, to get another column - those columns are linearly dependent.
56
Linear Independence
True of a set of vectors if no vector in the set is a linear combination of the other vectors
57
Matrix Attribute: Square
m = n and all columns are linear independent
58
Square Matrix Attribute: Singular
A square matrix with linear independent columns
59
Matrix: Norm
Tools used to measure the size of vectors Functions mapping vectors to non-negative values Measures the distance from the origin to the point x
60
Lp norm
Generalization of L2 and L1 norm. L1 norm (p = 1) L2 norm (p = 2) Linf norm (p = inf)
61
L2 Norm
aka Euclidean Norm Measures the ordinary Euclidean distance from the origin
62
L1 Norm
aka Manhattan Norm Measures the sum of the absolute values of the components of a vector Used in ML applications when the difference between zero and nonzero elements is very important
63
L-inf Norm
aka Maximum Norm Measures the maximum absolute value of components of a vector
64
Diagonal Matrix
Consists mostly of zeros and have nonzero entries only along the main diagonal Ex. Identity Matrix
65
Symmetric Matrix
Any matrix that is equal to its own transpose Often arise when the entries are generated by some function of two arguments that does not depend on the order of the arguments
66
Unit Vector
A vector with unit norm
67
Unit Norm
Refers to a vector whose norm is equal to 1. Vector of length 1.
68
Matrix: Orthogonal
In linear algebra refers to two vectors that are perpendicular to each other. In other words, if the dot product of two vectors is zero, they are orthogonal. Orthogonal matrixes are of interest because their inverse is very cheap to compute
69
Matrix: Orthonormal
Vectors that are not only orthogonal but also have unit norm
70
Eigendecomposition
One of the most widely used kinds of matrix decomposition We decompose a matrix into a set of eigenvectors and eigenvalues
71
Eigenvalue
A fundamental concept in linear algebra that represent the scaling factor associated with a vector when a linear transformation is applied to it. In other words, an eigenvalue tells you how much a vector is stretched or shrunk when it is multiplied by a matrix.
72
Eigenvector
Vectors that remain unchanged in direction when a linear transformation is applied to them. In other words, when a matrix is multiplied by an eigenvector, the result is a scalar multiple of the original eigenvector.
73
Positive Definite
A matrix whose eigenvalues are all positive
74
Positive Semidefinite
A matrix whose eigenvalues are all positive or zero
75
Negative Definite
A matrix whose eigenvalues are all negative
76
Negative Semidefinite
A matrix whose eigenvalues are all negative or zero
77
Singular Value Decomposition (SVD)
Provides another way to factorize (decompose) a matrix, into singular vectors and singular values Enables us to discover some of the same kind of information as the eigendecomposition reveals but the SVD is more generally applicable Every real matrix has a SVD, but that is not true for the eigenvalue decomposition
78
Singular Vectors
Closely related to singular values and are obtained from the same decomposition. They provide information about the principal directions of the data represented by a matrix.
79
Singular Values
A fundamental concept in linear algebra that provide information about the scaling and rotation properties of a matrix. They are closely related to eigenvalues and eigenvectors.
80
Moore-Penrose Pseudoinverse
Generalization of the inverse matrix for rectangular or non-invertible square matrices.
81
Trace Operator
Gives the sum of all the diagonal entries of a matrix
82
Determinant
Scalar value associated with a square matrix Function that maps matrices to real values. Equal to the product of all eigenvalues of the matrix. Absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space If determinant=0, space is contracted completely along at least one dimension If determinant=1, then the transformation preserves volume
83
Agent Function
Describes the agent's behavior - its mapping from percepts to actions Policy
84
Agent Program
Concrete implementation the Agent Function.
85
In general, an agent's choice of action at any given instant can depend on its built-in knowledge and on the entire percept sequence observed to date, but not on anything it hasn't perceived
TODO
86
Performance Measure
Captures the notion of desirability by evaluating given sequences of environmental states Return?
87
As a general rule, it is better to design performance measures according to what one actually wants to be achieved in the environment, rather than according to how one thinks the agent should behave
TODO
88
Omniscience
The ability to know, for certain, the actual outcome of actions and then act accordingly to achieve goal.
89
Q: Compare and contrast information gathering and exploration actions
Information Gathering actions are actions taken to acquire predefined information while explorative actions are taken to discover new/unknown information
90
Information Gathering Actions
Taking actions in order to modify future, predefined, percepts Ex. Looking both ways before you cross the street
91
Explorative Actions
Actions taken to discover new ideas, concepts, or possibilities with a focus to broaden understanding.
92
Task Environment: Components
PEAS 1. Performance 2. Environment 3. Actuators 4. Sensors
93
Task Environment: Properties
1. Observability: Fully vs. Partial 2. Agent: Single vs Multi Mulit-agent environments can be competitive or cooperative as well 3. State-Transition: Deterministic vs Nondeterministic 4. Interaction: Episodic vs Sequential 5. Stationarity: Static vs Dynamic 6. State/Action Space: Discrete vs Continuous 7. Model: Known vs Unknown
94
Q: Difference between Stochastic and Nondeterministic
Stochastic = specified probabilities ("25% chance of rain tomorrow") vs Nondeterministic = non-specified probabilities ("chance of rain tomorrow")
95
Fully- vs Partially-Observable
Fully-Observable = if an agent's sensors given it access to the complete state of the environment at each point in time Partially-Observable = if the sensors only give partial access Unobservable = sensors are unable to see any of the environment
96
Single vs Multi Agent Environment
The key distinction is whether object B's behavior is best described as maximizing a performance measure whose value depends on Agent A's behavior? If yes, Multi Agent If no, Single Agent Multi Agent can be competitive or cooperative
97
Deterministic vs Nondeterministic
Deterministic = the next state of the environment is completely determined by the current state and the action executed by the agent(s) Nondeterministic = otherwise
98
Episodic vs Sequential
Episodic = Agent's experience is divided into atomic episodes with 1 time-step each Sequential = The current decision could affect all future decisions Checking Defective Parts = Episodic Chess = Sequential
99
Stationary (Static) vs. Nonstationary (Dynamic)
Nonstationary = environment can change while agent is operating within it Stationary = otherwise
100
Discrete vs Continuous
Describes handling/datatypes of the environment/state, time, and actions of a task.
101
Known vs. Unknown
Known = Agent has a full transition model of the environment where the outcome of every action is KNOWN Unknown = otherwise
102
Agent Architecture
Computing device with physical sensors and actuators that agent program will on
103
Agent Program: Types
1. Simple Reflex 2. Model-Based Reflex 3. Goal-Based 4. Utility-Based
104
Simple Reflex Agent
Agent that selects actions on the basis of the current percept only, ignoring the rest of the percept history. Operates via condition-action rules
105
Model-Based Reflex Agent
Agent that maintains a transition model and sensor model
106
Goal-Based Agent
Agents that utilize a goal describing situations that are desirable - like being at a particular destination Agent that has a reward signal of 1 only at the end of the episode.
107
Utility-Based Agent
Agents that utilize a goal but also a utility function to understand if one state is more desirable than another state on the way to achieving the goal Value Function
108
Condition-Action Rule
if A then B aka situation-action rule production if-then rule
109
Transition Model
Models "how the world (environment) works"
110
Sensor Model
Models how the environment is reflected by the agent's percepts of it
111
Learning Agent: Components
1. Learning Element 2. Performance Element 3. Critic 4. Problem Generator
112
Learning Element
Element responsible for making improvements to the agent Uses feedback from the critic on how the agent is doing and determines how the performance element should be modified to do better in the future
113
Performance Element
Element responsible for selecting external actions. Takes percepts and decides on actions
114
Critic
Determines how well the agent is doing
115
Problem Generator
Responsible for suggesting actions that will lead to new and informative experiences. Responsible for exploration
116
Agent Components: Representation
1. Atomic 2. Factored 3. Structured 4. Distributed Representation
117
Atomic Representation
Each state of the world is indivisible - it has no internal structure
118
Factored Representation
Splits each state up into a fixed set of variables/attributes, each of which can have a value.
119
Structured Representation
Considers states but also the relationships between them. Underlies relational databases and first-order logic, first-probability models, and much of natural language
120
Statistical Learning
Vast set of tools for understanding data. These tools can be supervised or unsupervised.
121
Supervised Learning
Involves building a statistical model for predicting, or estimating, an output based on one or more inputs
122
Unsupervised Learning
There are inputs but no supervising output (like supervised learning); nevertheless, we can learn relationships and structure from such data.
123
Regression
Predicting a continuous/quantitative output value Supervised Problems with a quantitative response
124
Classification
Predicting a categorical/qualitative output value Supervised Problems with a qualitative response
125
Clustering
Grouping elements together according to their observed characteristics Unsupervised
126
The Four ISL Premises
1. Many statistical learning methods are relevant and useful in a wide range of academic and non-academic disciplines, beyond just the statistical sciences 2. Statistical learning should not be viewed as a series of black boxes 3. While it is important to know what job is performed by each cog, it is not necessary to have the skills to construct the machine inside the box 4. It is presumed that the reader is interested in applying statistical learning methods to real-world problems
127
Input Variables
aka predictors, independent variables, features, variables. Notation: X
128
Output Variables
aka response, dependent variable Notation: Y
129
General Form of Relationship
Y = f(X) + epsilon
130
Relationship: epsilon
random error term Nonsystematic Information Independent of X and has mean zero
131
Relationship: f(X)
Systematic Information
132
Q: Why estimate f?
1. Prediction 2. Inference
133
Reducible Error
Error introduced because of the chosen model. Reducible because choosing a different model could yield more or less error if model is a better representation of the true relationship.
134
Irreducible Error
Even if our model was perfect, there would still be error because Y is a function of f(X), our model, AND epsilon, our nonsystematic error term. Because of epsilon, error is always present in our model and is irreducible.
135
Parametric Models
Method where the problem of estimating our model is reduced to estimating a set of parameters. Involve a two-step model-based approach: 1. We make an assumption about the functional form, or shape, of our model (for example linear for a linear model) 2. Select a procedure that uses our training data to fit/train our selected model. Ex. (1) Linear Regression + (2) Ordinary Least Squares
136
Nonparametric Models
Methods that do not make explicit assumptions about the functional form our model will take Instead, they seek an estimate that is as close to the data points while still being generalizable (not overfit) Because there is no functional form assumption, a greater amount of data is required in order to obtain an accurate estimate for f.
137
Overfitting
Artifact of model and training process If the model is too flexible, it could learn the noise/errors of the data and not the overall relationship and therefore not be a generalized solution
138
Trade-Off Between Prediction Accuracy and Model Interpretability
The more accurate the model, the less interpretable it is because it is usually nonparametric rather than parametric.
139
Semi-Supervised Learning
In this method, a model is trained on a dataset that contains a small amount of labeled data and a large amount of unlabeled data. Essentially using a model to learn on a small batch of labeled data and then using said model to label the unlabeled data.
140
How can we measure how well model predictions actually match the observed data? How can we measure the quality of fit?
Fundamentally, we need to measure the distance between unseen data (test data) and our model's predictions on said data. Regression: - MSE = mean squared error - Classification: - Error Rate -
141
Mean Squared Error (MSE)
1/n*sum( (y_true-y_pred)^2 )
142
Classification: Error Rate
1/n*sum( I(y_true!=y_pred) ) Computes the fraction of incorrect classifications We want this value to be minimized Where I(y_true!=y_pred) = indicator variable and takes the form of 0 or 1 depending on y_true!=y_pred result. I = 1 if y_true!=y_pred (misclassed) I = 0 if y_true =y_pred (correct)
143
Bias1
Model Driven Error There is a true relationship in the data that we are trying to model and every model we use will not model that relationship perfectly, but some will model it better than others. If we have a nonlinear relationship in our data and try to build a model using linear regression, it will not model it very well and have high bias. We could use a nonparametric model and estimate the relationship much better.
144
Variance
Data Driven Error Variance refers to the amount by which our model would change if we estimated it using a different training data set. Fundamentally, if we use different training data sets, we will generate a different model - and each of these models should be similar given similar training data High Variance = small changes in training data result in large changes to our model
145
Bias-Variance Trade-Off
Bias = Model Driven Error Variance = Data Driven Error A model that estimates the relationship very well (low bias) will be more susceptible to small changes in the data (high variance) While models that estimate the relationship more generally (high bias) will be less susceptible to small changes in the data (low variance) As a general rule, as we use more flexible methods, the variance will increase, and the bias will decrease.
146
146