Bioinformatics Lecture 6 Flashcards

Question 1

Q

use of machine learning in biomedicine

Answer

A

most importantly personalised medicine

Question 2

Q

problem of datasets in biomedicine

Answer

A

a lot of data
of not many samples
-> increases risk of overfitting

Question 3

Q

unsupervised learning

Answer

A

mainly look at clusters
uses unlabelled data
finds patterns in data

Question 4

Q

problem unsupervised learning in biology

Answer

A

different causes of same outcome

or vice versa

Question 5

Q

k-means clustering algorithm

Answer

A

how many clusters explain the data best

stops when there is no change in assignment of points anymore

Question 6

Q

advantages k-means clustering algorithm

Answer

A

simple and fast
always works
easy to understand

Question 7

Q

disadvantages k-means clustering algorithm

Answer

A

need to choose k manually

is non-deterministic, depends on initial distribution of points

Question 8

Q

supervised learning

Answer

A

users labelled data

either classification or predictinon

Question 9

Q

classification use

Answer

A

finding discrete groups

Question 10

Q

regression use

Answer

A

predicting continuous traits

Question 11

Q

nearest mean classifier

Answer

A

looks for nearest group in a new sample

Question 12

Q

problem nearest mean classifier

Answer

A

some observations are more similar to the other group than the one that they are assigned to

Question 13

Q

nearest neighbour classifier

Answer

A

what is the nearest observation

or sometimes nearest k observations

Question 14

Q

overfitting

Answer

A

partly because there is so much data about each sample

the more complicated the curve the more likely it is that new observation doesn’t fit at all

Question 15

Q

rule of thumb against overfitting

Answer

A

you need ten times more feautures than n samples

Question 16

Q

solutions to overfitting

Answer

Study These Flashcards

A

feature selection

either by dimensionality reduction or hand-picking

Question 17

Q

dimensionality reduction

Answer

Study These Flashcards

A

PCA
e. g. if height and shape vary together there is only one dimension
if not, there are more

Question 18

Q

hand-picking features

Answer

Study These Flashcards

A

based on biological knowledge

e. g. with TCGA (the cancer genome atlas)

Question 19

Q

disadvantage deep learning

Answer

Study These Flashcards

A

big black box

nobody understands it

Question 20

Q

cross validation

Answer

Study These Flashcards

A

train model
then test it on data it was not trained on
gold standard to assess fitting

Question 21

Q

cross validation with k models risk

Answer

Study These Flashcards

A

chance finding of model that doesn’t actually work

Question 22

Q

validation methods

Answer

Study These Flashcards

A

leave one out cross validation

n-fold cross validation

Bioinformatics Lecture 6 Flashcards

(22 cards)