Big Ideas Flashcards
Statistical modelling
Modelling is the process of incorporating information into a tool that can forecast and make predictions.
Statistical modelling Equation
Y= f(x) +e
Prediction
Once we have a good estimate of F(x) we can use it to make predictions on new data we treat F as a black box since we only care about the accuracy of the predictions not necessarily how it works
Inference
We want to understand the relationship between X and why we can no longer treat F as a black box since we want to understand how why changes with respect to X
Reducible
Error that can potentially be reduced using the most appropriate statistical learning techniques to estimate F the goal is to minimise the reducible error
Irreducible
Error that cannot be reduced no matter how well we estimate F irreducible error is unknown and unmeasurable and will always be an upper bound for e
Parametric
Models that first assume the shape of f(X) and then we fit the model (e.g. we assume the data to be linear). This simplifies the problem from estimating f(X) to just estimating a set of parameters however if our initial assumption is wrong this will lead to a bad result.
Non parametric
Models that don’t make any assumption about the shape of FX which allow them to fit a wider range of shapes but may lead to overfitting. E.g. k-NN
Supervised
Models that fit input variables X to a known output variable Y.
Unsupervised
Models that take in input variables ex but they do not have an associated output why to supervise the training. The goal is to understand relationships between the variables or observations.
Blackbox
Models that make decisions , but we do not know what happens under the hood. (e.g.deep learning, neural networks )
Interpretable
Models that provide insight into why they make their decisions. ( e.g. Linear regression , decision trees)
Generative
Learns the joint probability distribution p(x, y). For example, if we wanted to distinguish between fraud or not fraud, we would build a model for what fraudulent transactions look like and one for not what non fraudulent transactions look like. Then we compare a new transaction to our two models and see which is more similar.
Discriminative
Learns the conditional probability distribution p(y|x). For example, we try a line that separates between two classes and do not care about how the data was generated.
Occam’s razor
Philosophical principle that the simplest explanation is the best explanation. In modelling if we are given two models that predict equally well we should choose the simpler one; choosing the more complex one can often result in overfitting ( or just memorising the training data). Simpler is usually defined as having less parameters or assumptions.
Curse of dimensionality
As the number of features d grows, points become very far apart in Euclidean distance and the entire future space is needed to find the key nearest neighbours. Eventually, open become equidistant which means all points are equally similar, which means algorithms that use distance measures are pretty much useless. This is not a problem for some high dimensional data sets since data relies on low dimensional subspace ( such as images of faces or handwritten digits). In other words, the data only sits in a small corner of the feature space open) think of how trees only grow near the surface of the earth and not the entire atmosphere).