Liner regression Flashcards
what is simple linear regression used for
Supervised learning;
quantitative response π (dependant) on the basis of a single predictor variable (independent) π
If π is to be approximated by a linear function, then it becomes:
π=π½0+π½1π+π
What does B0 mean
B0 is intercept term: the expected value of π when π = 0
If π is to be approximated by a linear function, then it becomes:
π=π½0+π½1π+π
What does B1 mean
B1 is the slope
what is the process of linear regression
- assess the significance of the coefficients
- quantify the extent to which the model fits the data
(line of best fit using r squared
how is the quality of linear regression assessed
using residual standard error.
EG if RSE = 3.26: actual sales in each market deviate from the true regression line by
approximately 3,260 units on average
What happens when more variable are added to a linear regression model
R2 will increase
What are the uncertainties when predicting using a MULTIPLE linear regression model
Reducible error: coefficients are only estimates for the true population regression plane
Model bias: linear model (or any other models) for π(π) is almost always an approximation of reality.
Irreducible error: the response cannot be predicted perfectly because of the
random error π of the model
Assumptions of the linear model
Additivity: the effect of changes in a predictor π on the response π is independent of the values of the other predictors. (no other factors impact)
Linearity: change in the response π due to a one-unit change in π is constant,
regardless of the value of π . π
When is linear regression not applicable
- to order the outcomes eg 1= stroke
- if the probability is outside 0-1
what is logistic regression
Logistic regression estimates the probability of an event occurring, such as voted or didnβt vote (discreet outcome, based on a given dataset of independent variables
what makes logistic regression different to linear
It is used to make a prediction about a categorical variable instead of a continuous one.
also has a probability between 0-1
logs are categorical
what is the negative of logistic regression
it needs a large data set to have sufficient statistical power to detect a significant effect
what is the dummy variable approach
qualitative predictors with the logistic regression mode.
Dummy variables assign the numbers β0β and β1β to indicate membership in any mutually exclusive and exhaustive category
it creates a value of 0 and 1
What is linear discriminant analysis
In LDA, we model the distribution of the predictors π separately in each of the response classes (i.e. given π), and then use Bayesβ theorem to flip these around into estimates forPr π = π π = π₯ .
3 reasons to use LDA
When classes are well-separated: parameter estimates for the logistic are surprisingly unstable.
If π (data set) is small and the distribution of the predictors π is normal
LDA is popular when we have more than two response classes.
it maximises separability
what happens if alpha is too small
the optimiser will take a long time to find the minimum
what is exploding gradient.
the the slope is vertical so the system will become completely unstable
what causes an exploding gradient.
when we have complex models with many para meters and large nural network
what is a nested function
functions that embeds another function. as a result of neural link.
what is an activation function
function that decides whether information goes from one layer to another.
an example is a step function.
what is the difference between bagging and boosting
go over this one.
bagging- multiple models with the same training set
boosting- selecting data points which give wrong predictions.
Each time the data gives a wrong prediction it trains the new model
explain the tradeoff between accuracy and interpretability
increasing training data sets may make result more accurate but less easy to digest.
between random bagging, boosting and random foresting has the most chance of over fitting when adding more data
boosting because you increase the likelyhood to overtrain the model and the model becomes less effective at predicting future data
what is a recommender system
A recommendation system is an AI algorithm, that uses Big Data to suggest or recommend additional products to consumers.
past purchases, search history, demographic information
what does machine learning do
Finds a mathematical formula when applied to a collection of inputs (Β« training data Β») produces the desire outputs.
what is machine learning
imput+ desired result
computation
program
what is traditional programming
input+ Programm
computation
= results
what are the different types of unsupervised learning
dimension reduction and clustering.
what is dimension reduction
a technique used to reduce the number of features in a dataset while retaining as much of the important information as possible.
supervised learning
the training set gives the computer example answers.
eg pictures of cats and dogs are already provided
what does x mean in an algorithm
Input, also known as features or exogenous variables
what does Y mean in an algorithm
Output, also known as label, response or endogenous variable: y
what does x1 or y1 mean in an algorithm
to collect historical data from a previous algorithm.
what are the two types of supervised learning
classification and regression
what is classification
(what type of learning is it)
Supervised learning
we are trying to predict results which have discrete
output (i.e. category or class)
eg identifying objects or language
types of classification
Logistic Regression
Linear (and quadratic) Discriminant Analysis
K-Nearest Neighbors
RLab: Logistic, LDA, QDA, KNN
what is regression
we are trying to predict results which have continuous output.
like finding the line of best fit
stock prices forecast, correlation analysis, medical diagnosis, demand and sales volume analysis,β¦
semi supervised learning
It uses a small amount of labeled data and a large amount of unlabeled data, which provides the benefits of both UL and SL while avoiding the challenges of finding a large amount of labeled data.
how does reinforcement learning work
The goal is to learn a policy which is a function (similar to the model in SL) that takes the feature vector of a state as input and outputs an optimal action to execute in that state.
The action is optimal if it maximizes the expected average reward.
the policy is constantly being updated
what is model-based and what is model-free reinforcement learning?
Model-based means memorizing lots of information
Model-free means generalize situation
eg he self-driving car doesnβt memorize every movement but tries to generalize situations and act rationally while obtaining a maximum reward.
what is a scalar
a numerical value like 2
what is a vector
is an ordered list of scalar values, called attributes, like π = β2, 5 .
what is a matrix
matrix is a rectangular array of numbers arranges in rows and columns.
2 6 β1
30 β6 β3
what is a function
A function is a relation that associates each element π₯ of a set π³ (the domain of the function) to a single element π¦ of another set π΄ (the codomain of the function).
where is the local minimum found
We say that π(π₯) has a local minimum at π₯=π if π(π₯)β₯π(π) for every π₯ in some open interval π₯ = π.
what is a derivative
a function π is a function or a value that describes how fast π
grows (or decreases).
what is differentiation
Differentiation is the process of finding a derivative.
what is a discreet random variable
a random variable from a distinct data set
like a dice can only be random between 1-6
what is a continuous random variable
a random variable from an infinite data set
what is bayes rule
Conditional probability = the probability of the random variable π = π given the observed predictor vector π₯0 of the random variable
π:Pr (π=π |π=π₯0) =
Pr π=ππ=π₯0 Pr(π=π₯0)
ββββββββββββ
Pr(π = π)
what are parameters
variables that define the model learned by the learning algorithm (are directly modified by the algorithm based on the training data).
what does it mean if a model has a low biased
the model predicts the training data well
what does it mean if a model has high biased
model makes many mistakes on the training data.
The line of best fit may underfit the data and may consider the general direction of data.
what is the solution of high biased and underfitting
Main reasons: - model is too simple for the data (linear regression)
- engineered features are not informative enough
Main solutions: - try a more complex model
- engineer features with higher predictive power
describe a model with low variance
Low variance = low sensitivity = performs well on both train and test sets.
describe a model with high variance
High variance = high sensitivity = performs well on train but poor on test
overfitting
what is a training set
BEFORE Analyst feeds the algorithm input data, which corresponds to an expected output.
The model evaluates the data repeatedly to learn more about the dataβs behavior and then adjusts itself to serve its intended purpose.
what is a test set
AFTER the model is built, testing data once again validates that it can make
accurate predictions.
Test data provides a final, real-world check of an unseen dataset to confirm that the ML algorithm was trained effectively.
what are the causes and solution for high variance (overfitting)
Problems
model is too complex for the data (deep NN)
-too many features but a small number of training examples
Solutions
-try simpler model
- add more training data if possible
- regularize the model (more widely used)
what is bagging (bootstrap aggregate)
and why is it useful
multiple models of the same algorithm with different random training samples
it avoids overfitting data
what is boosting
selecting data points which give wrong predictions.
Each time the data gives a wrong prediction it trains the new model
Often causes overfitting
what are the 4 types of classification methods
logistic
Linear discriminant analysis (maximises distance)
QDA
Kβs nearest neighbours
if the boundaries are linear what model of classification shall I use
Linear discriminant analysis and logistic regression
How to tell is a model is multivariate
if it has more than one x
how to tell if a model is linear and additive
the X is ^to the power
linear equation will always be in the form of $y = mx + b$
what does r1,r2 mean in an algorithm
they are different leaves on a decision tree
what happens is alpha is too large on gradient descent
your optimizer will be jumping big leaps and never find the minimum
what happens is alpha is too small on gradient descent
it will take forever to find the minimum
what is a perceptron
A perceptron takes several binary inputs π₯1 , π₯2 , π₯3 ,β¦ and produces a single binary output as follows: