Bioinformatics Lecture 6 Flashcards

1
Q

use of machine learning in biomedicine

A

most importantly personalised medicine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

problem of datasets in biomedicine

A

a lot of data
of not many samples
-> increases risk of overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

unsupervised learning

A

mainly look at clusters
uses unlabelled data
finds patterns in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

problem unsupervised learning in biology

A

different causes of same outcome

or vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

k-means clustering algorithm

A

how many clusters explain the data best

stops when there is no change in assignment of points anymore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

advantages k-means clustering algorithm

A

simple and fast
always works
easy to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

disadvantages k-means clustering algorithm

A

need to choose k manually

is non-deterministic, depends on initial distribution of points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

supervised learning

A

users labelled data

either classification or predictinon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

classification use

A

finding discrete groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

regression use

A

predicting continuous traits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

nearest mean classifier

A

looks for nearest group in a new sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

problem nearest mean classifier

A

some observations are more similar to the other group than the one that they are assigned to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

nearest neighbour classifier

A

what is the nearest observation

or sometimes nearest k observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

overfitting

A

partly because there is so much data about each sample

the more complicated the curve the more likely it is that new observation doesn’t fit at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

rule of thumb against overfitting

A

you need ten times more feautures than n samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

solutions to overfitting

A

feature selection

either by dimensionality reduction or hand-picking

17
Q

dimensionality reduction

A

PCA
e. g. if height and shape vary together there is only one dimension
if not, there are more

18
Q

hand-picking features

A

based on biological knowledge

e. g. with TCGA (the cancer genome atlas)

19
Q

disadvantage deep learning

A

big black box

nobody understands it

20
Q

cross validation

A

train model
then test it on data it was not trained on
gold standard to assess fitting

21
Q

cross validation with k models risk

A

chance finding of model that doesn’t actually work

22
Q

validation methods

A

leave one out cross validation

n-fold cross validation