AI+ML+DS Flashcards

Question

Turing Test

Answer 1

A test of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human. It was proposed by Alan Turing in 1950.

Answer 2

Deals with designing devices that act optimally on the basis of feedback from the environment.

Answer 3

1. Acting Humanly: The Turing Test approach 2. Thinking Humanly: The cognitive modeling approach 3. Thinking Rationally: The "laws of thought" approach 4. Acting Rationally: the rational agent approach

Answer 4

Anything that can be viewed as perceiving its environment and acting upon that environment. Perceives through sensors Acts through actuators Agent = architecture + program

Answer 5

An Agent that acts so as to achieve the best outcome, or, when there is uncertainty, the best expected outcome Rational != Perfect Rationality maximizes expected performance, while perfection maximizes actual performance

Answer 6

General framework for representing and analyzing systems The focus on the study and construction of agents that "do the right thing" What is the "right thing"? -> Defined by the standard model Control Theory: controller minimizes a cost function Operations Research: policy maximizes a sum of rewards Statistics: Decision rule minimizes a loss function Economics: Where a decision maker maximizes utility or some measure of social welfare

Answer 7

The values or objectives put into the machine must be aligned with those of the human Behaviors are not "unintelligent" or "insane"; they are a logical consequence of defining winning as the sole objective for a machine

Answer 8

A step-by-step set of instructions or rules to be followed to solve a specific problem or achieve a particular outcome. It's like a recipe for a computer, providing a precise sequence of actions to perform

Answer 9

States that any consistent formal system that is powerful enough to express basic arithmetic statements is necessarily incomplete. In other words, there will always be true statements within the system that cannot be proven or disproven using the axioms and rules of inference within that system.

Answer 10

Capable of being computed by an effective procedure

Answer 11

Refers to the property of a problem that can be solved by an algorithm in a reasonable amount of time. In other words, a tractable problem is one that can be efficiently solved using a computer.

Answer 12

Time required to solve instances of the problem grows exponentially with the size of the instances

Answer 13

A concept in theoretical computer science that refers to a class of decision problems that are considered to be among the most difficult to solve. These problems are characterized by the fact that while it is relatively easy to verify a solution, it is extremely difficult to find a solution.

Answer 14

A field of study that deals with making rational choices in the face of uncertainty. It provides a framework for analyzing and making decisions when there are multiple possible outcomes and associated probabilities.

Answer 15

A branch of mathematics and economics that studies strategic decision-making among rational agents. It analyzes situations where the outcome for one agent depends on the choices made by other agents.

Answer 16

Systems composed of multiple autonomous agents that interact with each other and their environment to achieve a common goal. These agents can be software programs, robots, or even humans.

Answer 17

A field of study that applies mathematical and analytical techniques to solve complex problems that arise in business, industry, and other organizations. It focuses on optimizing systems, processes, and decisions to improve efficiency and effectiveness.

Answer 18

A hypothetical future point in time when technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization. It is often associated with the development of artificial intelligence (AI) that surpasses human intelligence in all aspects.

Answer 19

A principle in neuroscience that states that neurons that fire together wire together. This means that if two neurons are frequently activated simultaneously, the connection between them is strengthened. Conversely, if two neurons are rarely activated simultaneously, the connection between them weakens.

Answer 20

C=AB If A is of shape m x n and B is of shape n x p, then C is of shape m x p A must have the same number of columns as B has rows A = (2x3) B = (3x2) C = (2x2) A = [[1, 2, 3], [4, 5, 6]] B = [[7, 8], [9, 10], [11, 12]] Dot product of A and B = C: C = [[1*7+2*9+3*11, 1*8+2*10+3*12], [4*7+5*9+6*11, 4*8+5*10+6*12]]

Answer 21

Matrix A and Matrix B must be of same shape. Then you multiply the corresponding elements in each matrix for result.

Answer 22

Multiplying every value of a matrix by a scalar constant Resulting matrix is the same shape and output is original matrix with each element multiplied by constant scalar

Answer 23

m x n m = # of rows n = # of columns

Answer 24

A collection of two or more linear equations with the same variables. Each equation represents a straight line on a graph, and the solution to the system is the point(s) where these lines intersect. A set of two or more equations with the same variables. The goal is to find values for the variables that satisfy all of the equations simultaneously.

Answer 25

I Matrix that does not change any vector when we multiple that vector by that matrix The structure of the identify matrix is simple: all the entries along the main diagonal are 1, while all other entries are zero

Answer 26

Produces the Inverse Matrix The inverted matrix is another matrix that, when multiplied by our original matrix, produces the identity matrix. A * B = B * A = I

Answer 27

The point specified by the vector of all zeros

Answer 28

Produced by multiplying each vector, in a list of vectors (matrix), by a corresponding scalar coefficient and then adding the results.

Answer 29

The set of all points obtainable by linear combination of the original vectors

Answer 30

Refers to the set of all linear combinations of the columns of a matrix. It is a subspace of the vector space containing the columns.

Answer 31

Refers to a relationship between a set of vectors where one vector can be expressed as a linear combination of the others. In other words, if you can find a set of non-zero scalars (coefficients) such that a linear combination of the vectors equals the zero vector, then the vectors are linearly dependent. If you can add 2 columns together, after first multiplying them by a scalar constant, to get another column - those columns are linearly dependent.

Answer 32

True of a set of vectors if no vector in the set is a linear combination of the other vectors

Answer 33

m = n and all columns are linear independent

Answer 34

A square matrix with linear independent columns

Answer 35

Tools used to measure the size of vectors Functions mapping vectors to non-negative values Measures the distance from the origin to the point x

Answer 36

Generalization of L2 and L1 norm. L1 norm (p = 1) L2 norm (p = 2) Linf norm (p = inf)

Answer 37

aka Euclidean Norm Measures the ordinary Euclidean distance from the origin

Answer 38

aka Manhattan Norm Measures the sum of the absolute values of the components of a vector Used in ML applications when the difference between zero and nonzero elements is very important

Answer 39

aka Maximum Norm Measures the maximum absolute value of components of a vector

Answer 40

Consists mostly of zeros and have nonzero entries only along the main diagonal Ex. Identity Matrix

Answer 41

Any matrix that is equal to its own transpose Often arise when the entries are generated by some function of two arguments that does not depend on the order of the arguments

Answer 42

A vector with unit norm

Answer 43

Refers to a vector whose norm is equal to 1. Vector of length 1.

Answer 44

In linear algebra refers to two vectors that are perpendicular to each other. In other words, if the dot product of two vectors is zero, they are orthogonal. Orthogonal matrixes are of interest because their inverse is very cheap to compute

Answer 45

Vectors that are not only orthogonal but also have unit norm

Answer 46

One of the most widely used kinds of matrix decomposition We decompose a matrix into a set of eigenvectors and eigenvalues

Answer 47

A fundamental concept in linear algebra that represent the scaling factor associated with a vector when a linear transformation is applied to it. In other words, an eigenvalue tells you how much a vector is stretched or shrunk when it is multiplied by a matrix.

Answer 48

Vectors that remain unchanged in direction when a linear transformation is applied to them. In other words, when a matrix is multiplied by an eigenvector, the result is a scalar multiple of the original eigenvector.

Answer 49

A matrix whose eigenvalues are all positive

Answer 50

A matrix whose eigenvalues are all positive or zero

Answer 51

A matrix whose eigenvalues are all negative

Answer 52

A matrix whose eigenvalues are all negative or zero

Answer 53

Provides another way to factorize (decompose) a matrix, into singular vectors and singular values Enables us to discover some of the same kind of information as the eigendecomposition reveals but the SVD is more generally applicable Every real matrix has a SVD, but that is not true for the eigenvalue decomposition

Answer 54

Closely related to singular values and are obtained from the same decomposition. They provide information about the principal directions of the data represented by a matrix.

Answer 55

A fundamental concept in linear algebra that provide information about the scaling and rotation properties of a matrix. They are closely related to eigenvalues and eigenvectors.

Answer 56

Generalization of the inverse matrix for rectangular or non-invertible square matrices.

Answer 57

Gives the sum of all the diagonal entries of a matrix

Answer 58

Scalar value associated with a square matrix Function that maps matrices to real values. Equal to the product of all eigenvalues of the matrix. Absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space If determinant=0, space is contracted completely along at least one dimension If determinant=1, then the transformation preserves volume

Answer 59

Describes the agent's behavior - its mapping from percepts to actions Policy

Answer 60

Concrete implementation the Agent Function.

Answer 61

Captures the notion of desirability by evaluating given sequences of environmental states Return?

Answer 62

The ability to know, for certain, the actual outcome of actions and then act accordingly to achieve goal.

Answer 63

Information Gathering actions are actions taken to acquire predefined information while explorative actions are taken to discover new/unknown information

Answer 64

Taking actions in order to modify future, predefined, percepts Ex. Looking both ways before you cross the street

Answer 65

Actions taken to discover new ideas, concepts, or possibilities with a focus to broaden understanding.

Answer 66

PEAS 1. Performance 2. Environment 3. Actuators 4. Sensors

Answer 67

1. Observability: Fully vs. Partial 2. Agent: Single vs Multi Mulit-agent environments can be competitive or cooperative as well 3. State-Transition: Deterministic vs Nondeterministic 4. Interaction: Episodic vs Sequential 5. Stationarity: Static vs Dynamic 6. State/Action Space: Discrete vs Continuous 7. Model: Known vs Unknown

Answer 68

Stochastic = specified probabilities ("25% chance of rain tomorrow") vs Nondeterministic = non-specified probabilities ("chance of rain tomorrow")

Answer 69

Fully-Observable = if an agent's sensors given it access to the complete state of the environment at each point in time Partially-Observable = if the sensors only give partial access Unobservable = sensors are unable to see any of the environment

Answer 70

The key distinction is whether object B's behavior is best described as maximizing a performance measure whose value depends on Agent A's behavior? If yes, Multi Agent If no, Single Agent Multi Agent can be competitive or cooperative

Answer 71

Deterministic = the next state of the environment is completely determined by the current state and the action executed by the agent(s) Nondeterministic = otherwise

Answer 72

Episodic = Agent's experience is divided into atomic episodes with 1 time-step each Sequential = The current decision could affect all future decisions Checking Defective Parts = Episodic Chess = Sequential

Answer 73

Nonstationary = environment can change while agent is operating within it Stationary = otherwise

Answer 74

Describes handling/datatypes of the environment/state, time, and actions of a task.

Answer 75

Known = Agent has a full transition model of the environment where the outcome of every action is KNOWN Unknown = otherwise

Answer 76

Computing device with physical sensors and actuators that agent program will on

Answer 77

1. Simple Reflex 2. Model-Based Reflex 3. Goal-Based 4. Utility-Based

Answer 78

Agent that selects actions on the basis of the current percept only, ignoring the rest of the percept history. Operates via condition-action rules

Answer 79

Agent that maintains a transition model and sensor model

Answer 80

Agents that utilize a goal describing situations that are desirable - like being at a particular destination Agent that has a reward signal of 1 only at the end of the episode.

Answer 81

Agents that utilize a goal but also a utility function to understand if one state is more desirable than another state on the way to achieving the goal Value Function

Answer 82

if A then B aka situation-action rule production if-then rule

Answer 83

Models "how the world (environment) works"

Answer 84

Models how the environment is reflected by the agent's percepts of it

Answer 85

1. Learning Element 2. Performance Element 3. Critic 4. Problem Generator

Answer 86

Element responsible for making improvements to the agent Uses feedback from the critic on how the agent is doing and determines how the performance element should be modified to do better in the future

Answer 87

Element responsible for selecting external actions. Takes percepts and decides on actions

Answer 88

Determines how well the agent is doing

Answer 89

Responsible for suggesting actions that will lead to new and informative experiences. Responsible for exploration

Answer 90

1. Atomic 2. Factored 3. Structured 4. Distributed Representation

Answer 91

Each state of the world is indivisible - it has no internal structure

Answer 92

Splits each state up into a fixed set of variables/attributes, each of which can have a value.

Answer 93

Considers states but also the relationships between them. Underlies relational databases and first-order logic, first-probability models, and much of natural language

Answer 94

Vast set of tools for understanding data. These tools can be supervised or unsupervised.

Answer 95

Involves building a statistical model for predicting, or estimating, an output based on one or more inputs

Answer 96

There are inputs but no supervising output (like supervised learning); nevertheless, we can learn relationships and structure from such data.

Answer 97

Predicting a continuous/quantitative output value Supervised Problems with a quantitative response

Answer 98

Predicting a categorical/qualitative output value Supervised Problems with a qualitative response

Answer 99

Grouping elements together according to their observed characteristics Unsupervised

Answer 100

1. Many statistical learning methods are relevant and useful in a wide range of academic and non-academic disciplines, beyond just the statistical sciences 2. Statistical learning should not be viewed as a series of black boxes 3. While it is important to know what job is performed by each cog, it is not necessary to have the skills to construct the machine inside the box 4. It is presumed that the reader is interested in applying statistical learning methods to real-world problems

Answer 101

aka predictors, independent variables, features, variables. Notation: X

Answer 102

aka response, dependent variable Notation: Y

Answer 103

Y = f(X) + epsilon

Answer 104

random error term Nonsystematic Information Independent of X and has mean zero

Answer 105

Systematic Information

Answer 106

1. Prediction 2. Inference

Answer 107

Error introduced because of the chosen model. Reducible because choosing a different model could yield more or less error if model is a better representation of the true relationship.

Answer 108

Even if our model was perfect, there would still be error because Y is a function of f(X), our model, AND epsilon, our nonsystematic error term. Because of epsilon, error is always present in our model and is irreducible.

Answer 109

Method where the problem of estimating our model is reduced to estimating a set of parameters. Involve a two-step model-based approach: 1. We make an assumption about the functional form, or shape, of our model (for example linear for a linear model) 2. Select a procedure that uses our training data to fit/train our selected model. Ex. (1) Linear Regression + (2) Ordinary Least Squares

Answer 110

Methods that do not make explicit assumptions about the functional form our model will take Instead, they seek an estimate that is as close to the data points while still being generalizable (not overfit) Because there is no functional form assumption, a greater amount of data is required in order to obtain an accurate estimate for f.

Answer 111

Artifact of model and training process If the model is too flexible, it could learn the noise/errors of the data and not the overall relationship and therefore not be a generalized solution

Answer 112

The more accurate the model, the less interpretable it is because it is usually nonparametric rather than parametric.

Answer 113

In this method, a model is trained on a dataset that contains a small amount of labeled data and a large amount of unlabeled data. Essentially using a model to learn on a small batch of labeled data and then using said model to label the unlabeled data.

Answer 114

Fundamentally, we need to measure the distance between unseen data (test data) and our model's predictions on said data. Regression: - MSE = mean squared error - Classification: - Error Rate -

Answer 115

1/n*sum( (y_true-y_pred)^2 )

Answer 116

1/n*sum( I(y_true!=y_pred) ) Computes the fraction of incorrect classifications We want this value to be minimized Where I(y_true!=y_pred) = indicator variable and takes the form of 0 or 1 depending on y_true!=y_pred result. I = 1 if y_true!=y_pred (misclassed) I = 0 if y_true =y_pred (correct)

Answer 117

Model Driven Error There is a true relationship in the data that we are trying to model and every model we use will not model that relationship perfectly, but some will model it better than others. If we have a nonlinear relationship in our data and try to build a model using linear regression, it will not model it very well and have high bias. We could use a nonparametric model and estimate the relationship much better.

Answer 118

Data Driven Error Variance refers to the amount by which our model would change if we estimated it using a different training data set. Fundamentally, if we use different training data sets, we will generate a different model - and each of these models should be similar given similar training data High Variance = small changes in training data result in large changes to our model

Answer 119

Bias = Model Driven Error Variance = Data Driven Error A model that estimates the relationship very well (low bias) will be more susceptible to small changes in the data (high variance) While models that estimate the relationship more generally (high bias) will be less susceptible to small changes in the data (low variance) As a general rule, as we use more flexible methods, the variance will increase, and the bias will decrease.

AI+ML+DS Flashcards

- Deep Learning, Goodfellow - Artificial Intelligence, Norvig - Introduction to Statistical Learning in Python, Hastie (147 cards)