Machine learning Flashcards

1
Q

Explain Parametric?

A

Parametric statistical procedures rely on assumptions about the shape of the distribution (i.e., assume a normal distribution) in the underlying population and about the form or parameters (i.e., means and standard deviations) of the assumed distribution. Advantage: Restrictive models are much more interpretive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain Non-parametric?

A

Nonparametric statistical procedures rely on no or few assumptions about the shape or parameters of the population distribution from which the sample was drawn. Advantage (compared to parametric methods): They may accurately fit a wider range of possible shapes for f. Disadvantage: A very large number of observations is required in order to obtain an accurate estimate of f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain Supervised?

A

For each observation of the predictor measurement(s)π‘₯i, 𝑖=1,…,𝑛 there is an associated response measurement 𝑦i
Supervised (examples):
β€’ Linear regression
β€’ Logistic regression
β€’ Support vector machines
β€’ Neural Networks
β€’ Collaborative filtering (Methods that try to fill in the missing values e.g. Netflix ratings)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Unsupervised?

A

We observe a vector of measurements π‘₯i, 𝑖=1,…,𝑛, but no associated response 𝑦i
Unsupervised (examples):
β€’ Clustering
β€’ PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Bias-variance tradeoff?

A
  • We want a low variance and a low bias at the same time. However, when variance decreases, bias increases and vice versa.
  • Bias refers to the error that is introduced by approxi- mating a real-life problem, which may be extremely complicated, by a much simpler model. For example, linear regression assumes that there is a linear relationship between Y and X1, X2, . . . , Xp. It is unlikely that any real-life problem truly has such a simple linear relationship, and so performing lin- ear regression will undoubtedly result in some bias in the estimate of f
  • Variance is the amount that the estimate of the target function will change given different training data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Quality of fit?

A
  • There is no free lunch in statistics –> No one method dominates all others over all possible data sets
  • Important task to decide for any given set of data which method produces the best results
  • Selecting the best approach can be one of the most challenging parts of performing machine learning in practice
  • In order to evaluate the performance of a machine learning method, we need to quantify the extent to which the predicted response value is close to the true response
  • We compute the MSE by using our training data
  • However, we are not interested whether our method works on the training data
  • Rather we are interested how it works on our test data
  • Suppose that we are interested in developing an algorithm to predict stock prices based on previous stock returns.
  • We can train the method using stock returns from the past 6 months
  • However, we are not interested in predicting last weeks stock return / price
  • We are interested in predicting next weeks prices / returns
  • How can we go about trying to select a method that minimizes the test MSE?
  • In some settings, we may have a separate test data set (set of observations which we did not use to train the model)
  • We can then simply evaluate our model on the test observations (compute the MSE of the test data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can machine learning be included in your research?

A
  • It could be used to figure out which consumers who use the deposit return system, and then based on that, it could be used to target advertising towards these consumers
  • Used for prediction - would the people using it now also use it in two years? - long-term change
  • Predict who wants to use it, based on previous users with similar characteristics, who are using it
  • However, it might be quite hard, given that ML is usually based on big data, and the act of depositing bottles is a very analogue process, at least in Denmark, because you don’t have any technical touchpoints that connect the consumer to the behavior on an online data basis.
  • A solution to this could be making the Deposit return system app based in the US.
  • For that reason, it is even more important to do consumer research projects as ours on the topic, and even branch into even more descriptive data forms after – to be able to determine which consumers to target, and how
How well did you know this?
1
Not at all
2
3
4
5
Perfectly