Final Exam Prep Flashcards

Question

What is the equation for MAE?

Answer 1

R_abs(h) = 1/n En i=1 |y_i - h|

Answer 2

the median

Answer 3

n has to be odd

Answer 4

formal name for minimizing average loss

Answer 5

the midrange

Answer 6

an attribute of the data (columns)

Answer 7

numerical, categorical, boolean

Answer 8

we are overfitting to the data

Answer 9

because we want our model to generalize well to unseen data and make good predictions in the real world

Answer 10

H(x) = w_0 + w_1x^2

Answer 11

H(x) = w_0e^w_1x

Answer 12

R_sq(w_0, w_1) = 1/n En i=1 (y_i - (w_0 + w_1x_i))^2

Answer 13

(En i=1 (x_i - xbar)(y_i - ybar))/(En i=1 x_i - xbar)^2

Answer 14

ybar - w_1*xbar

Answer 15

the regression line

Answer 16

the process of finding optimal parameters

Answer 17

H*(x) = w_0* + w_1*x

Answer 18

the correlation coefficient

Answer 19

the strength of the linear association of two variables

Answer 20

-1 < r < 1

Answer 21

there is a negative association; left down to right

Answer 22

there is a positive association, bottom left up to right

Answer 23

the correlation is stronger in those areas

Answer 24

(x_i - mean) / (standard deviation of x)

Answer 25

1/n En i=1 (x_i - mean of x / SD of X)(y_i - mean of y / SD of Y)

Answer 26

SD of y increases and the slope gets steeper

Answer 27

SD of x increases and slope gets more shallow

Answer 28

finding models that maximize r^2

Answer 29

(SD of y)^2 * (1-r^2)

Answer 30

the simple linear regression model

Answer 31

an ordered collection of n number in R^n

Answer 32

the l_2 norm

Answer 33

||v|| = sqrt(v_1^2 + v_2^2 + ... + v_n^2)

Answer 34

a magnitude and a direction

Answer 35

u*v = u_1*v_1 + u_2 * v_2 ....

Answer 36

a single numebr

Answer 37

||u||||v|| cos theta

Answer 38

if and only if their dot product is 0, and vice versa

Answer 39

the angle between them is also 0

Answer 40

using an element wise sum

Answer 41

multiply each element by the scalar

Answer 42

any vector of the form a_1v_1 + a_2v_2 + ... + a_d v_d

Answer 43

the set of all vectors that can be created using linear combinations of those vectors

Answer 44

the orthogonal projection of y onto spanx

Answer 45

e = y - wx

Answer 46

x * y / x * x

Answer 47

||e|| = ||y - wx||

Answer 48

the orthogonal projection of y onto span x

Answer 49

a table of numbers with n rows and d columns

Answer 50

when they have the same dimensions

Answer 51

it is a matrix with n rows and 1 column

Answer 52

a linear combination of the columns of A using the weights in v

Answer 53

all of the vectors that can be written in the form Xw

Answer 54

the weights, also known as the parameter vector

Answer 55

X^TXw* = X^Ty

Answer 56

(X^TX)^-1X^Ty

Answer 57

when X^TX is full rank

Answer 58

w* has infinite solutions

Answer 59

vector of all observed values, y

Answer 60

vector of predicted values, h

Answer 61

the vector of all errors between the observed and predicted values, e

Answer 62

a matrix where all the values for each feature are in columns and the first column is all ones

Answer 63

e_i = y_i - H(x_i)

Answer 64

||v|| = sqrt(v_1^2 + v_2^2 + ... + v_n^2)

Answer 65

a vector with all the parameter values, i.e. slope, intercept

Answer 66

a function of multiple variables

Answer 67

R_sq(w) = dR_sq/dw = vector of derivatives for each parameter

Answer 68

linear regression with multiple features

Answer 69

like a design matrix but all values are transposed

Answer 70

H(x) = w_1 1/x_2 + w_2 sinx + w_3e^x

Answer 71

the process of creating new features out of existing information in our dataset

Answer 72

as many as we want as long as our hypothesis function is linear in params

Answer 73

to the left and we need to decrease t

Answer 74

the the right and we need to increase t

Answer 75

t1 = t_0 - df/dt (t_0)

Answer 76

our initial guess

Answer 77

the learning rate; the step size

Answer 78

as many times as we can until convergence

Answer 79

a method for finding the input to a function f that minimizes the function

Answer 80

a technique for approximating the solution to a mathematical problem

Answer 81

if a tangent line doesn't exist that can go under the line at any point

Answer 82

|x - 4|, e^x, (x -3)^24

Answer 83

it is the negative of a convex function

Answer 84

checking if a function is twice differentiable, if it is greater than or equal to 0 it is convex and if it is less than or equal to 0 it is concave

Answer 85

it converges to a global minimum as long as the step size is small enough

Answer 86

when the derivative is 0

Answer 87

it has a global minimum

Answer 88

yes but it is not guaranteed to find a global minimum

Answer 89

some process whose outcome is random

Answer 90

flipping a coin, rolling a die

Answer 91

an unordered collection of items

Answer 92

number of elements in set A

Answer 93

finite or countable set of possible outcomes of an experiment

Answer 94

assignment of probabilities to outcomes in S

Answer 95

0 <= p(s) <= 1

Answer 96

they have the same sample spaces but different probability distributions

Answer 97

a subset of the sample space

Answer 98

it assigns the probability of 1/n to each element of S

Answer 99

P(A or B) = P(A u B) = P(A) + P(B)

Answer 100

they cannot happen simultaneously

Answer 101

P(A u B) = P(A) + P(B) - P(A and B)

Answer 102

P(A and B) = P(A n B) = P(A) * P(B|A)

Answer 103

P(not A) = 1 - P(A)

Answer 104

P(B|A) means "the probability that B happens, given that A happened."

Answer 105

P(B|A) = P(B)

Answer 106

A and B are independent if knowing that A happened gives you no additional information about B and vice versa

Answer 107

inverse trend when data is joined versus analyzing numbers individually

Answer 108

drawing one element uniformly at random and returning it to the list, repeat

Answer 109

drawing one element uniformly at random, repeat

Answer 110

list, order matters, repetitions allowed (with replacement), elements in listed order

Answer 111

collection of elements, order does not matter, no repetitions allowed (without replacement), elements in no particular order

Answer 112

order matters, no repetitions allowed, counts the # of sequences of k distinct elements chosen from n possible elements

Answer 113

order does not matter, no repetitions allowed, counts the # of sets of size k chosen from n possible elements

Answer 114

n! / (n - k)!

Answer 115

n! / k! (n - k)!

Answer 116

from the multiplication rule or conditional probability

Answer 117

P(B|A) = P(A|B) * P(B) / P(A)

Answer 118

P(A|B) * P(B) / P(B) * P(A|B) + P(notB) * P(A|notB)

Answer 119

how to update the probability of one event given that another has occurred

Answer 120

prior belief that A happens

Answer 121

our updated belief that A happens, now that we know B happens

Answer 122

flipping a coin and getting a head will not affect what your second toss is

Answer 123

at least one of them must have a zero probability

Answer 124

almost never

Answer 125

events that become independent upon learning some new information and vice versa

Answer 126

making predictions based on examples

Answer 127

P(class|features) = P(features|class) * P(class) / P(features)

Answer 128

whichever has the larger numerator

Answer 129

based on the training data

Answer 130

spam from ham (good, non-spam email)

Answer 131

ignores location of words within an email and frequency of words

Answer 132

better handling of previously unseen data

Final Exam Prep Flashcards

(174 cards)