Exam Flashcards

1
Q

Why do we not want to use the MSE error in logistic regression?

A
  1. ) MSE for logistic regression will give a non-convex cost function.
    2) The cross entropy cross function results from doing a max likelihood optimization and can therefore be interpreted as the probability of a datapoint beeing in class 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can we do to fix high bias in our machine learning model?

A

1) More flexible models
2) Less regularization
3) Add features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can we do to fix high variance in our machine learning model?

A

1) Reduce model flexibility
2) Get more training data
3) Remove features
4) More regularization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a necessary condition for ensamble learning to improve the results over using only one classifier?

A

The classifiers in the ensamble should have little or preferably no correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between homogenous and heterogenous ensambles?

A

Homohenous uses classifiers from one ML class, heterogenous mixes from different ML classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of error do parallel and sequential ensambles respectivly reduce, and name some examples of such ensambles

A

1) Parallel reduces variance
- Bagging
- Voting
- Random forrests
2) Sequential reduces bias
- Boosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What kind of weak learners does the adaboost algorithm use?

A

Decision stumps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two cost functions used for decision trees? When should we use one over the other?
Do we want to minimize or maximise the costs?

A

1) Information gain:
Entropy parent - weighted average entropy children.
Entropy = - sum p(y_i) log p(y_i)
We want to maximize information gain

2) Gini index:
A weighted sum of gini scores of the leaf nodes.
Gini score = 1 - sum p(y_i)^2
We want to minimize Gini index, 0 is the best.

Generally similar performance, but for problems with many categorical variables information gain is biased towards attributes with more categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between bagging with decission trees and random forrests?

A

Bagging always selects the best input features to split on, this makes the resulting trees highly correlated. Random forrests reduce this correlation, by only allowing splits on a random subset of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name the idea behind:

1) SGD with momentum
2) Adagrad
3) RMSprop (or AdaDelta)
4) Adam

A

1) Add a running average of former gradients
2) Divide by gradient magnitude elementwise
3) Same as 2, but use a decaying average
4) Combines 3 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the formula for batch normalization (During training time)?

A
z_hat = (z - batch_mean) / batch_std)
z_new = y*z_hat + b, where y and b are learned parameters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we use segmentation?

A

1) To deterimine volume
2) For modelling
3) To determine location and extend of a object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What:

1) Are partial volume effects?
2) Can cause leaking in image segmentation?
3) Can cause image artifacts in medical imaging?
4) Is the problem of anisotropic resolution?
5) Is the problem of morphological variability?

A

1) The coarse sampling of imaging can create blury edges
2) Similar pixel values can cause one segment to “leak” into another.
3) Metal implements, hair products…
4) 3D images often have different resolution along the different axis.
5) Objects that can change shape, such as organs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the formula for:

1) Accuracy
2) Precission
3) Recal
4) Specificity
5) Dice/ F1 score

A
  1. (TP + TN) / All
  2. TP / (TP + FP)
  3. TP / P
  4. TN / N
  5. 2 * (Precission * Recall) / (Precission + Recall)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we calculate surface distance?

A

We can calculate the distance to the boundary in one segmentation and trace the boundary of the second segmentation in the first one. This will give us pixel wise surface distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some limitations of DSC?

A

Quite huge deformations might give the same score as small translations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the idea of simple intensity thresholding?

A

Threshold using a histogram of intensities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some advantages and disadvantages of simple intensity thresholding?

A
Advantages:
1) Simple
2) Fast
Disadvantages:
1) Non connected segments
2) Different ilumnation can mean we should have different thresholds in the same image
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the idea behind region growing?

A

Select a seed, and “grow” to similar pixels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some advantages and disadvantages of region growing?

A
Advantages:
1) Relativly fast
2) Has connected regions
Disadvantages:
1) Seed usually choosen manually
2) Region must be homogenous 
3) Leakages and "rough" boundaries likely
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the idea behind graph cut?

A

Define a MRF( Markow Random Field) of all the pixels and a energy function. Minimze the energy function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can we use MRF for image restoration if a part of the image is missing?

A

Define a energy function that gives high energy when neigbouring pixels with different values, e.g. mean squared or capped mean squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some advantages and disadvantages of graph cuts?

A
Advantages:
1) Accurate
2) Interactive
Disadvantages:
1) Requires user input
2) Difficult to select tuning parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the idea behind active contour?

A

Define contours (e.g. circles) over the image and minimize the energy of these circles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are some advantages and disadvantages of active contour?

A

Advantages:
1) Can incorporate shape constraints
2) Smooth edges and avoids leakage
Disadvantages:
1) Complex energy and regularization terms
2) Computationally expensive
3) Difficult to select hyperparameters parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the idea behind atlas segmentation?

A

Experts have segmented several images. These images are then registered with the new image, and the label is propagated. The multiple atlas images can either be combined or used individually with voting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are some advantages and disadvantages of multi atlas label propagation?

A

Advantages:

1) Robust and accurate (ensamble)
2) Yields plausible segmentation
3) Fully automatic

Disadvantages:

1) Computationally expensive
2) Cannot deal with abnormalities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the idea behind random forrest segmentation?

A

Use a random forrest with a patch around the pixel as input. Possibly at several scales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are some advantages and disadvantages of random forrest segmentation?

A

Advantages:

1) Robust and accurate (ensamble)
2) Computationally efficient
3) Fully automatic

Disadvantages:

1) Shallow model, no hierarchical features
2) No guarantees on connectednes

30
Q

What are the problems of gold standard, and how can the be solved?

A

Problems:

  1. Intra-observer variability, the same observer might create different solutions on different occasions.
  2. Inter-observer variability, diffeferent observers might create different solutions
  3. Requires expert knowledge and is time consuming

Solutions:

  1. Same observer can make several solutions
  2. Different observers can make several solutions
  3. Agreement/ disagreement can be quantified.
31
Q

Which two architectures do we have for segmentation using convonets?

A

1) Segementation via dense classification

2) Encoder-decoder networks

32
Q

What happens when we feed in a picture larger than intended to a fully cconvolutional neural network created for classification?

A

We get a heat (classification) map.

33
Q

What is tiling and what are the advantages/ disadvantages?

A

In a fully convolutional network we can feed the network overlaping parts (tiles) of the image instead of everything at once. By tiling correctly, the combined output is the same as feeding the whole image.

Tilling will reduce memory requirements, but increase reduntent computations.

34
Q

What loss do we usually use for segmentation with convolutional networks?

A

pixelwise cross entropy loss.

35
Q

What is multi-scale prossesing in image segmentation with convo nets?

A

Feed in the image at several different resolutions and send them trough individual pathways in the network before combing them at the end.

36
Q

What is the problem with 0-padding or bed of nails filling in segmentation with convo nets?

A

May create artifacts in the result. Instead we could us nearest neighbour filling and mirror padding to get a smoother result.

37
Q

What is class inbalance in segmentation and how can it be solved?

A

Class inbalance is if we have a different number of pixel/ voxels belonging to each segment. Using segments such as in multi-scale prossesing can to some degree self-regulate this. We could also use:

1) Weighted sampling
2) Weighted cross entropy
3) Different loss
4) Cascading (Train a ML algo to rule out “easy” background cases).

38
Q

What are some uses of unsupervised learning?

A

1) Clustering
2) Density estimation
3) Feature extraction
4) Dimensionality reduction (Vizualisation …)
5) Anomality detection

39
Q

What are some challenges of K-Means clustering?

A

1) It tends to create clusters of the same size
2) Sensitive to initialization
3) Handles non-linear and anisotropic data poorly

40
Q

What is the difference between a algorithm and a model?

A

A model is a formulation of the problem, a algorithm is one procedure to solve the problem.

41
Q

What is the formula for minimizing K-Means?

A

min sum sum z_nk*||x_n - mu_k||^2

42
Q

How could we handle anistropic and non-linear data with K-means?

A

Use another loss function, e.g. graph based or kernel based.

43
Q

What is the PCA algorithm?

A
  1. Create the design matrix X
  2. Center the design matrix: X’ = X - mu
  3. UDV^T = SVD(1/(N-1) X’X’^T, keep the eigenvectrors in U corresponding to the largest eigenvalues.
44
Q

How can we view the visualize the mean, the 2x positive, standard devaition and the 2x negative standard deviation in PCA shape models?

A

The mean is the eigenvectors, the 2x standard deviation can be found found be adding/ subtracting:

U * sqrt(D) * 2

45
Q

What are the interpretations of PCA?

A

1) Maximizes the explained variance after transformation
2) Fits an ellipsoide (Gaussian) to the data
3) Minimizes L2 reconstruction loss
4) Linear probabilistic model

46
Q

How can we use PCA to remove noise?

A

Add a sparsity inducing penalty, L0, L1…

47
Q

How can we make sure that a autoencoder does not just learn the identity mapping?

A

1) Bottleneck layer
2) Regularization
3) Training on noisy input

48
Q

What is bayesian inference?

A

Propagating probabilistic knowledge from oberved variables to explanatory variables.

49
Q

What is the EM algorithm?

A

The EM algorithm iterativly creates a Expectation of the log likelihood and then maximises this expectation.

50
Q

How can we perform fully bayesian GMM?

A

We add hyperpriors to the mu’s, sigma’s and the global paramter z.

A Dirchlete prior on pi_n usually favors fewer non-zero clusters, hyperpriors on mu, sigma can avoid clusters that degenerate to 0 variance.

51
Q

What are some challenges of k-means clustering?

A

Clusters tend to be the same size
Depends on initialization
Handles anisotropic data and non-linearites poorly

52
Q

Describe the steps in AAM (Active appearance models)

A
  1. Calculate the shape model ( mean + eigenmodes troughout the dataset)
  2. Warp the image so it fits the landmark template
  3. Create appearance model using PCA on the “shape free paches”
  4. PCA jointly on the shape and appearance to capture correlations.
53
Q

How does the R-CNN ( Region based CNN) work?

A
  1. Get ROI (Regions of interesest) from a sperate algorithm
  2. Forward each region trough a convonet
  3. Use SVM for classification and Bbox regression
54
Q

What is the probem for the R-CNN?

A

Training and inference is slow, training is ad-hoc

55
Q

What is the SPP (Spatial pooling pyramide) methode for object detection?

A
  1. Send the whole image trough a convonet.
  2. Extract ROI from the result.
  3. Use spatial pyramide to reduce the size of the ROI.
  4. Use fully connected to svm for classification and fully connected to BB regression.
56
Q

What is the main advantage of SPP(Spatial pyramide pooling) over R-CNN?

A

Makes testing phase faster as we only need to use convo once. Training is still slow and ad-hoc, but faster than R-CNN

57
Q

What is the main advantage of Fast R-CNN over SPP and R-CNN?

A

It’s fully trainable, with “fast” training and inference time.

58
Q

What is the main difference between Fast R-CNN and faster R-CNN?

A

Faster R-CNN trains a convo net to do proposal selection. The proposal convo net uses classification (object/not) and BB regression loss

59
Q

What is the mask R-CNN?

A

It adds FCN part to the faster R-Cnn to do semantic segmentation

60
Q

Describe the YOLO algorithm

A
  1. Split image into grid (e.g. 7x7)
  2. For each cell make two (or more) boundingboxes and predict p(object)
  3. Each cell also predicts a class probability p(class | object)
  4. Combine the bouding boxes and class prediction
  5. Perform NMS
61
Q

How are the bounding boxes trained in Yolo?

A

For each cell find the best bounding box, adjust it and increase confidence. For cells without objects and other boxes reduce confidence.

62
Q

How does the Fast R-CNN RoI pooling work?

A

Divide the project proposal into 7x7 grid and do max pooling

63
Q

Why is it an advantage to use weak learners in boosting?

A

1) Boosting trains learners sequentially, and the computational complexity of training weak learners is smaller than strong learners
2) Boosting is prone to overfitting, using weak learners will reduce the probability of overfitting.

64
Q

Describe the decision tree optimization for creating a feature space partition

A
At each node Sj
for each feature 
for each value of this feature: 
evaulate I(Sj, Aj) 
chose the best feature and value for splitting 
reapeat
65
Q

Name some advantages of decision trees and dissadvantages of decissions trees?

A

Advantages:

  1. Explainable model
  2. Can handle multi class problems
  3. Can handle categorical and continous variables
  4. Requires little preprossesing
  5. The cost is logarithmic in the number of samples

Disadvantages:

  1. Prone to overfitting
  2. Biassed towards classes with more datapoints
66
Q

Name some methods to determine overlapp between reference and computed segmentation.

A
  1. DSC 2*|A and B| / (|A| + |B|), equivalent to F1.

2. Jaquard |A and B| / |A U B|.

67
Q

Name some metrics for evaluating segmentations vs GT.

A
  1. Accuracy
  2. Precission
  3. Recal
  4. Spesificity
  5. Dice
  6. Jaccard
  7. Surface distance
68
Q

Name some problems with image quality that might affect segmentation.

A
  1. Similar intra class intensities
  2. Inhomogenous intra class intensities
  3. Image artifacts
  4. Morphological variability
  5. Partial volume effect
  6. Anisotropic resolution
69
Q

What are the formulas for the dissimilarity meassures:

1) SSD (Sum of Squared Differences)
2) SAD (Sum of Absolute Differences)
3) CC (Correlation Coefficient)
4) NMI (Normalized Mutual Information)

A

1) 1/N sum (I(T(x_i)) - J(x_i))^2
2) 1/N sum |I(T(x_i)) - J(x_i)|
3) cov(I, J) / (sigma_i + sigma_j)
4) (H(I) + H(J)) / H(I, J)

70
Q

Why would we want to use NMI instead of MI?

A

NMI is invariant to the amount of overlapp.

71
Q

What are the tree steps of mapping image coordinates to world coordinates?

A

Translation, rotation, scaling.

72
Q

How can we do non-linear Image transforms?

A

Control point methods, free form deformation or dense displacement fields