Bioinformatics Lecture 1 Flashcards

1
Q

DNA sequencing machine

A

DNA comes out in random order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

meta genomics

A

data recovered directly form environmental sources

includes microbiome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

approach / goals of bioinformatics

A

store, process, analyse, model , predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

biomarkers

A

measurable characteristics informative about a biological state

can be e.g. genes or metabolites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Kaplan Meier curve

A

for progressive diseases
for incomplete observations
estimates / predicts survival

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sources of data in bioinformatics

A

clinical data
imaging
non-high throughput data
high throughput data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

high throughput profiling

A

automated process

outputs many different types of biological data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Alzheimer’s progression

A

present one or two decades before symptom onset

treatment only effective if early

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

FTD

A

frontal temporal dementia
more rare
little known about etiology
TAU and TDP proteins play a role

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

mass spectronomy

A

identifies mass to charge ratio of ions

used to identify proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

gene networks

A

co-regulated / co-expressed genes

entire pathway goes up or down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

individual differences in protein expression

A

some are noise
some are natural variation unrelated to disease
some influenced by e.g. what people ate before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

gene set enrichment analysis

A

method to analyse genes or proteins that are overrepresented in a large dataset

identifying genes that are regulated together

often related to disease phenotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

finding out functions of proteins

A

either wikipedia
or a biobank
or BLAST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

BLAST

A

extremely widely used

output: homologous protein sequnces

aligned to the query (input) sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

homology search

A

are there proteins with similar sequences?

often evolutionary related

ancestral and children sequences

17
Q

ancestral sequences

A

from evolutionary ancestor

usually unknown

18
Q

phylogenetic tree

A

see where variants cluster and how far they are from the ancestor

19
Q

sequence alignment in evolution

A

shows which sequences are conserved in evolution

20
Q

blast parameters

A

e-value
substation matrix
gap penalties
word size

21
Q

defining similarity

A

scored by alignment score
matches and mismatches
in the end gap penalty is subtracted
= bit score

22
Q

e-value

A

converts bit score into statistical score

.01 means 1 in 100

23
Q

PSI blast

A

to find very distantly related homologies
first does a normal blast
then iteratively searches
hits come in and can be dropped out
creates an evolutionary conversation profile
in form of a position specific scoring matrix

24
Q

PSSM

A

position specific scoring matrix
made after every iteration in psi blast
used as the scoring function (instead of the substitution matrix)

25
Q

master-slave alignment

A

used in both blast and psi blast

not the same as multiple sequence alignment

26
Q

when does iteration stop

A

technically after no new sequences within e value threshold are found

in practice often capped to five to avoid spurious findings

27
Q

number of genes

A

20.000

only 1.5% code

28
Q

number of proteins

A

20.000

29
Q

number of amino acids

A

20

30
Q

cells in the body

A

37 trillion

31
Q

number of base pairs

A

3 billion

32
Q

number of chromosomes

A

23 pairs

33
Q

typical patient cohort size

A

20 to 500