Intro Flashcards

Question

How do you find a0 and a1 that minimises the sums of square errors of residuals?

Answer 1

-find stationary points by taking partial derivatives of sum of square residuals -set to zero to solve optimization problem

Answer 2

approximately select X, Y an theta, and solve theta = (XT X)^-1(XT Y)

Answer 3

design matrix regressor matrix

Answer 4

- If XTX is not invertible, it cannot be solved - not invertible if the the OLS problem has non-unique solutions

Answer 5

two sequences of data are said to be colinear if there exists k not=0 such that x1i=kx2i

Answer 6

the associated OLS has infinite optimal solutions

Answer 7

when two feature variables are highly correlated providing redundant information

Answer 8

- increase the amount of training data - find and remove highly correlated data

Answer 9

- computing inverse of XTX can be computationally expensive - if the data is close to being collinear then the OLS solution becomes very sensitive to small changes in the training data set

Answer 10

those linear in parameters

Answer 11

Using R^2 coefficient

Answer 12

sum of squared error is small, therefore a good model

Answer 13

very bad model

Answer 14

For any continuous function f, with continuous interval [a b] and E>0, there exists a polynomial p, such that sup|f(x)-p(x)|

Answer 15

- Prevent overfitting to training data - remove user choice from a model

Answer 16

- Increasing model parameters will fit the training data more accurately, but unnecessary terms can cause overfitting.

Answer 17

Supervised learning of discrete data

Answer 18

It constructs a probability distribution instead of a model

Answer 19

An algorithm that describes the classification rules for hyperplanes in supervised learning of binary classifiers. The simplest Neural Network

Answer 20

no unique solution settle for approximate solutions (newton raphson method)

Answer 21

Identifies local minima, solving OLS

Answer 22

-Hard to visualise data in large dimensions -OLS fails

Answer 23

1. Compute centred data matrix X~ 2. Compute X~^T X~ 3. Find orthonormal eigenvector of X~^T X~ with the largest eigenvalue

Answer 24

-The orthonormal vector direction with the largest variation in data - The orthonormal vectors that define a linear manifold giving minimal reconstruction error - The orthonormal eigenvectors of X~^T X~ with the largest eigenvalues - q columns of W corresponding to the q largest squared singular value where singular value decomposition of the regressor matrix is X=UEW^T

Answer 25

A class of unsupervised learning methods that separates data into groups by similarity

Answer 26

provides a guarantee that certain functions can be approximated to arbitrarily high accuracy by a finite degree polynomial provided the function is defined over a finite interval

Answer 27

when the determinant is equal to zero

Answer 28

- Many coefficients and parameters in high degree polynomials - No guarantee of approximating discontinuous functions e.g tan(x) - Slow convergence rates Polynomials tend towards infinity, which in unnatural system behaviour

Answer 29

It penalises unnecessary non-zero parameters to help prevent the model becoming oversensitive to noise in the training data

Answer 30

randomly sample to find optimal lambda performance typically evaluated through cross-validation

Answer 31

f(x)=sgn(w^Tx)

Answer 32

x, w, sum everything, step, output

Answer 33

Add more basis functions

Answer 34

𝑓(𝑥;𝜃)=𝜃0 + ∑ i∈S 𝜃i 𝐾(𝑥,𝑥i) where 𝑆={indices of support vectors} and K:ℝn×ℝn→ℝ are kernel functions

Answer 35

- can adjust threshold and shape TPR and FPR - Get a probabilistic interpretation

Answer 36

- good against noise far away from true decision boundary - perfectly separates data when possible

Answer 37

(X^TX)-1 doesn't have to be computed, therefore it is computationally cheaper

Answer 38

A unique global minima

Answer 39

TF= number of times the term appears in text / Total number of terms in text

Answer 40

IDF = log 10 (Number of Documents / number of documents with term in it )

Answer 41

TFIDF = TF x IDF

Answer 42

rare words

Answer 43

because it is computationally expensive to solve OLS for large data sets

Answer 44

many optimal models with MSE=0. XTX will typically be noninvertible cannot use OLS

Answer 45

Most data sets are unlabelled it is costly to label data sets

Answer 46

X=U∑W^T U => unitary matrix where UTU=I W => unitary matrix where WTW=I ∑ => diagonal matrix with non negative elements ordered largest to smallest

Answer 47

the eigenvalues if the XTX matrix

Answer 48

=W∑TUTU∑WT =W∑T∑WT

Answer 49

The lowest one by convention

Answer 50

1/ number of elements in cluster X sum of the L2 Euclidian Norm squared (sum of the squares between the cluster and data point)

Answer 51

1. Randomly assign a number from 1 to K for each data point 2. Iterate until the cluster assignments stop changing - Compute centroid for each cluster - Update cluster assignment to closest cluster centre

Answer 52

More computationally efficient than brute force method

Answer 53

- must select number of clusters - Doesn't necessarily converge to optimal clusters - cannot handle non-convex clusters

Answer 54

if Psi doesn't have any noise

Answer 55

ARX(ny, nu)

Answer 56

ARMAX(ny, ne, nu)

Answer 57

The model is dependent on delayed error/noise.

Answer 58

A biased estimation

Answer 59

E(theta) = E (OLS sol) sub X Theta + e into Y ends up equal to theta *

Answer 60

theta = (XTX + lambda I)^-1 XT Y

Intro Flashcards

(86 cards)