Liner regression Flashcards

1
Q

what is simple linear regression used for

A

Supervised learning;

quantitative response π‘Œ (dependant) on the basis of a single predictor variable (independent) 𝑋

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If 𝑓 is to be approximated by a linear function, then it becomes:
π‘Œ=𝛽0+𝛽1𝑋+πœ–

What does B0 mean

A

B0 is intercept term: the expected value of π‘Œ when 𝑋 = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If 𝑓 is to be approximated by a linear function, then it becomes:
π‘Œ=𝛽0+𝛽1𝑋+πœ–
What does B1 mean

A

B1 is the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the process of linear regression

A
  1. assess the significance of the coefficients
  2. quantify the extent to which the model fits the data

(line of best fit using r squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how is the quality of linear regression assessed

A

using residual standard error.

EG if RSE = 3.26: actual sales in each market deviate from the true regression line by
approximately 3,260 units on average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens when more variable are added to a linear regression model

A

R2 will increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the uncertainties when predicting using a MULTIPLE linear regression model

A

Reducible error: coefficients are only estimates for the true population regression plane

Model bias: linear model (or any other models) for 𝑓(𝑋) is almost always an approximation of reality.

Irreducible error: the response cannot be predicted perfectly because of the
random error πœ– of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assumptions of the linear model

A

Additivity: the effect of changes in a predictor 𝑋 on the response π‘Œ is independent of the values of the other predictors. (no other factors impact)

Linearity: change in the response π‘Œ due to a one-unit change in 𝑋 is constant,
regardless of the value of 𝑋 . 𝑗

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is linear regression not applicable

A
  1. to order the outcomes eg 1= stroke
  2. if the probability is outside 0-1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is logistic regression

A

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote (discreet outcome, based on a given dataset of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what makes logistic regression different to linear

A

It is used to make a prediction about a categorical variable instead of a continuous one.

also has a probability between 0-1

logs are categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the negative of logistic regression

A

it needs a large data set to have sufficient statistical power to detect a significant effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the dummy variable approach

A

qualitative predictors with the logistic regression mode.

Dummy variables assign the numbers β€˜0’ and β€˜1’ to indicate membership in any mutually exclusive and exhaustive category

it creates a value of 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is linear discriminant analysis

A

In LDA, we model the distribution of the predictors 𝑋 separately in each of the response classes (i.e. given π‘Œ), and then use Bayes’ theorem to flip these around into estimates forPr π‘Œ = π‘˜ 𝑋 = π‘₯ .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3 reasons to use LDA

A

When classes are well-separated: parameter estimates for the logistic are surprisingly unstable.

If 𝑛 (data set) is small and the distribution of the predictors 𝑋 is normal
LDA is popular when we have more than two response classes.

it maximises separability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what happens if alpha is too small

A

the optimiser will take a long time to find the minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is exploding gradient.

A

the the slope is vertical so the system will become completely unstable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what causes an exploding gradient.

A

when we have complex models with many para meters and large nural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is a nested function

A

functions that embeds another function. as a result of neural link.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is an activation function

A

function that decides whether information goes from one layer to another.

an example is a step function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the difference between bagging and boosting

go over this one.

A

bagging- multiple models with the same training set

boosting- selecting data points which give wrong predictions.

Each time the data gives a wrong prediction it trains the new model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

explain the tradeoff between accuracy and interpretability

A

increasing training data sets may make result more accurate but less easy to digest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

between random bagging, boosting and random foresting has the most chance of over fitting when adding more data

A

boosting because you increase the likelyhood to overtrain the model and the model becomes less effective at predicting future data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is a recommender system

A

A recommendation system is an AI algorithm, that uses Big Data to suggest or recommend additional products to consumers.

past purchases, search history, demographic information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what does machine learning do

A

Finds a mathematical formula when applied to a collection of inputs (Β« training data Β») produces the desire outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is machine learning

A

imput+ desired result

computation

program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is traditional programming

A

input+ Programm

computation

= results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what are the different types of unsupervised learning

A

dimension reduction and clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is dimension reduction

A

a technique used to reduce the number of features in a dataset while retaining as much of the important information as possible.

29
Q

supervised learning

A

the training set gives the computer example answers.

eg pictures of cats and dogs are already provided

30
Q

what does x mean in an algorithm

A

Input, also known as features or exogenous variables

31
Q

what does Y mean in an algorithm

A

Output, also known as label, response or endogenous variable: y

32
Q

what does x1 or y1 mean in an algorithm

A

to collect historical data from a previous algorithm.

33
Q

what are the two types of supervised learning

A

classification and regression

34
Q

what is classification

(what type of learning is it)

A

Supervised learning

we are trying to predict results which have discrete
output (i.e. category or class)

eg identifying objects or language

35
Q

types of classification

A

Logistic Regression

Linear (and quadratic) Discriminant Analysis

K-Nearest Neighbors

RLab: Logistic, LDA, QDA, KNN

36
Q

what is regression

A

we are trying to predict results which have continuous output.

like finding the line of best fit

stock prices forecast, correlation analysis, medical diagnosis, demand and sales volume analysis,…

37
Q

semi supervised learning

A

It uses a small amount of labeled data and a large amount of unlabeled data, which provides the benefits of both UL and SL while avoiding the challenges of finding a large amount of labeled data.

38
Q

how does reinforcement learning work

A

The goal is to learn a policy which is a function (similar to the model in SL) that takes the feature vector of a state as input and outputs an optimal action to execute in that state.

The action is optimal if it maximizes the expected average reward.

the policy is constantly being updated

39
Q

what is model-based and what is model-free reinforcement learning?

A

Model-based means memorizing lots of information

Model-free means generalize situation

eg he self-driving car doesn’t memorize every movement but tries to generalize situations and act rationally while obtaining a maximum reward.

40
Q

what is a scalar

A

a numerical value like 2

41
Q

what is a vector

A

is an ordered list of scalar values, called attributes, like π‘Ž = βˆ’2, 5 .

42
Q

what is a matrix

A

matrix is a rectangular array of numbers arranges in rows and columns.

2 6 βˆ’1
30 βˆ’6 βˆ’3

43
Q

what is a function

A

A function is a relation that associates each element π‘₯ of a set 𝒳 (the domain of the function) to a single element 𝑦 of another set 𝒴 (the codomain of the function).

44
Q

where is the local minimum found

A

We say that 𝑓(π‘₯) has a local minimum at π‘₯=𝑐 if 𝑓(π‘₯)β‰₯𝑓(𝑐) for every π‘₯ in some open interval π‘₯ = 𝑐.

45
Q

what is a derivative

A

a function 𝑓 is a function or a value that describes how fast 𝑓
grows (or decreases).

46
Q

what is differentiation

A

Differentiation is the process of finding a derivative.

47
Q

what is a discreet random variable

A

a random variable from a distinct data set

like a dice can only be random between 1-6

48
Q

what is a continuous random variable

A

a random variable from an infinite data set

49
Q

what is bayes rule

A

Conditional probability = the probability of the random variable π‘Œ = 𝑗 given the observed predictor vector π‘₯0 of the random variable
𝑋:Pr (π‘Œ=𝑗 |𝑋=π‘₯0) =
Pr π‘Œ=𝑗𝑋=π‘₯0 Pr(𝑋=π‘₯0)
———————————–
Pr(π‘Œ = 𝑗)

50
Q

what are parameters

A

variables that define the model learned by the learning algorithm (are directly modified by the algorithm based on the training data).

51
Q

what does it mean if a model has a low biased

A

the model predicts the training data well

52
Q

what does it mean if a model has high biased

A

model makes many mistakes on the training data.

The line of best fit may underfit the data and may consider the general direction of data.

53
Q

what is the solution of high biased and underfitting

A

Main reasons: - model is too simple for the data (linear regression)
- engineered features are not informative enough

Main solutions: - try a more complex model
- engineer features with higher predictive power

54
Q

describe a model with low variance

A

Low variance = low sensitivity = performs well on both train and test sets.

55
Q

describe a model with high variance

A

High variance = high sensitivity = performs well on train but poor on test
overfitting

56
Q

what is a training set

A

BEFORE Analyst feeds the algorithm input data, which corresponds to an expected output.
The model evaluates the data repeatedly to learn more about the data’s behavior and then adjusts itself to serve its intended purpose.

57
Q

what is a test set

A

AFTER the model is built, testing data once again validates that it can make
accurate predictions.

Test data provides a final, real-world check of an unseen dataset to confirm that the ML algorithm was trained effectively.

58
Q

what are the causes and solution for high variance (overfitting)

A

Problems
model is too complex for the data (deep NN)
-too many features but a small number of training examples

Solutions
-try simpler model
- add more training data if possible
- regularize the model (more widely used)

59
Q

what is bagging (bootstrap aggregate)

and why is it useful

A

multiple models of the same algorithm with different random training samples

it avoids overfitting data

60
Q

what is boosting

A

selecting data points which give wrong predictions.

Each time the data gives a wrong prediction it trains the new model

Often causes overfitting

61
Q

what are the 4 types of classification methods

A

logistic
Linear discriminant analysis (maximises distance)

QDA

K’s nearest neighbours

62
Q

if the boundaries are linear what model of classification shall I use

A

Linear discriminant analysis and logistic regression

63
Q

How to tell is a model is multivariate

A

if it has more than one x

64
Q

how to tell if a model is linear and additive

A

the X is ^to the power

linear equation will always be in the form of $y = mx + b$

65
Q

what does r1,r2 mean in an algorithm

A

they are different leaves on a decision tree

66
Q

what happens is alpha is too large on gradient descent

A

your optimizer will be jumping big leaps and never find the minimum

67
Q

what happens is alpha is too small on gradient descent

A

it will take forever to find the minimum

68
Q

what is a perceptron

A

A perceptron takes several binary inputs π‘₯1 , π‘₯2 , π‘₯3 ,… and produces a single binary output as follows:

69
Q
A