Exam Flashcards
Why do we not want to use the MSE error in logistic regression?
- ) MSE for logistic regression will give a non-convex cost function.
2) The cross entropy cross function results from doing a max likelihood optimization and can therefore be interpreted as the probability of a datapoint beeing in class 1.
What can we do to fix high bias in our machine learning model?
1) More flexible models
2) Less regularization
3) Add features
What can we do to fix high variance in our machine learning model?
1) Reduce model flexibility
2) Get more training data
3) Remove features
4) More regularization
What is a necessary condition for ensamble learning to improve the results over using only one classifier?
The classifiers in the ensamble should have little or preferably no correlation.
What is the difference between homogenous and heterogenous ensambles?
Homohenous uses classifiers from one ML class, heterogenous mixes from different ML classes.
What type of error do parallel and sequential ensambles respectivly reduce, and name some examples of such ensambles
1) Parallel reduces variance
- Bagging
- Voting
- Random forrests
2) Sequential reduces bias
- Boosting
What kind of weak learners does the adaboost algorithm use?
Decision stumps
What are the two cost functions used for decision trees? When should we use one over the other?
Do we want to minimize or maximise the costs?
1) Information gain:
Entropy parent - weighted average entropy children.
Entropy = - sum p(y_i) log p(y_i)
We want to maximize information gain
2) Gini index:
A weighted sum of gini scores of the leaf nodes.
Gini score = 1 - sum p(y_i)^2
We want to minimize Gini index, 0 is the best.
Generally similar performance, but for problems with many categorical variables information gain is biased towards attributes with more categories.
What is the difference between bagging with decission trees and random forrests?
Bagging always selects the best input features to split on, this makes the resulting trees highly correlated. Random forrests reduce this correlation, by only allowing splits on a random subset of the data.
Name the idea behind:
1) SGD with momentum
2) Adagrad
3) RMSprop (or AdaDelta)
4) Adam
1) Add a running average of former gradients
2) Divide by gradient magnitude elementwise
3) Same as 2, but use a decaying average
4) Combines 3 and 1.
What is the formula for batch normalization (During training time)?
z_hat = (z - batch_mean) / batch_std) z_new = y*z_hat + b, where y and b are learned parameters.
How can we use segmentation?
1) To deterimine volume
2) For modelling
3) To determine location and extend of a object
What:
1) Are partial volume effects?
2) Can cause leaking in image segmentation?
3) Can cause image artifacts in medical imaging?
4) Is the problem of anisotropic resolution?
5) Is the problem of morphological variability?
1) The coarse sampling of imaging can create blury edges
2) Similar pixel values can cause one segment to “leak” into another.
3) Metal implements, hair products…
4) 3D images often have different resolution along the different axis.
5) Objects that can change shape, such as organs.
What is the formula for:
1) Accuracy
2) Precission
3) Recal
4) Specificity
5) Dice/ F1 score
- (TP + TN) / All
- TP / (TP + FP)
- TP / P
- TN / N
- 2 * (Precission * Recall) / (Precission + Recall)
How can we calculate surface distance?
We can calculate the distance to the boundary in one segmentation and trace the boundary of the second segmentation in the first one. This will give us pixel wise surface distance.
What are some limitations of DSC?
Quite huge deformations might give the same score as small translations.
What is the idea of simple intensity thresholding?
Threshold using a histogram of intensities
What are some advantages and disadvantages of simple intensity thresholding?
Advantages: 1) Simple 2) Fast Disadvantages: 1) Non connected segments 2) Different ilumnation can mean we should have different thresholds in the same image
What is the idea behind region growing?
Select a seed, and “grow” to similar pixels.
What are some advantages and disadvantages of region growing?
Advantages: 1) Relativly fast 2) Has connected regions Disadvantages: 1) Seed usually choosen manually 2) Region must be homogenous 3) Leakages and "rough" boundaries likely
What is the idea behind graph cut?
Define a MRF( Markow Random Field) of all the pixels and a energy function. Minimze the energy function.
How can we use MRF for image restoration if a part of the image is missing?
Define a energy function that gives high energy when neigbouring pixels with different values, e.g. mean squared or capped mean squared.
What are some advantages and disadvantages of graph cuts?
Advantages: 1) Accurate 2) Interactive Disadvantages: 1) Requires user input 2) Difficult to select tuning parameters
What is the idea behind active contour?
Define contours (e.g. circles) over the image and minimize the energy of these circles.
What are some advantages and disadvantages of active contour?
Advantages:
1) Can incorporate shape constraints
2) Smooth edges and avoids leakage
Disadvantages:
1) Complex energy and regularization terms
2) Computationally expensive
3) Difficult to select hyperparameters parameters
What is the idea behind atlas segmentation?
Experts have segmented several images. These images are then registered with the new image, and the label is propagated. The multiple atlas images can either be combined or used individually with voting.
What are some advantages and disadvantages of multi atlas label propagation?
Advantages:
1) Robust and accurate (ensamble)
2) Yields plausible segmentation
3) Fully automatic
Disadvantages:
1) Computationally expensive
2) Cannot deal with abnormalities
What is the idea behind random forrest segmentation?
Use a random forrest with a patch around the pixel as input. Possibly at several scales.