Intro Flashcards

1
Q

Data

A

Sequence Data:
Text strings using a limited alphabet, DNA/RNA(4 letters), Amino Acids (20 letters).
Can look genetic variations
Can measure gene expression (RNA sequencing)

Measurement Data:
Many different variables
All genes, many proteins, many metabolites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DNA, genes, proteins, metabolites

A

DNA (genome: genomics): chromosome double helix, 46 chromosomes, 23 pairs, 1 of each parent (in humans !! plants/animals don’t all have the same). Each cell contains 2 copies of DNA but eggs and sperm.
- Transcription -> RNA
RNA (genes: transcriptomics)
-Translation -> proteins (amino acids will fold into a specific structure for the protein)
Proteins: proteomics
- Catalyze metabolic reactions -> metabolites
Metabolites: metabolomics

Sequence data: genomics and transcriptomics
Numerical data: transcriptomics, proteomics and metabolomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Eukaryotic / Prokaryotic

A

Eukaryotic: nucleus, chromosomes
Prokaryotic: no nucleus, simple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Proteins

A

Cell signaling, catalyze metabolic reactions, transport metabolites, antibodies,…
-> they can bind on sites and chemically change. Their shape indicates the corresponding binding site

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Metabolites

A

Chemical compounds in the body

endogeneous: produced by the organism
exogeneous: from outside the organism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Genome

A
set of all DNA contained in a cell
Double stranded DNA: complementary, can restore the other strand from one another
eukaryote: linear dna
Prokaryote: circular dna
viral: variable

Mitochondrial DNA is only inherited from the mother

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DNA

A

4 bases/nucleotides
Complementary strands

Adenine, Guanine, Thymine, Cytosine
A-T and G-T
A-T is weaker because they only have 2 hydrogen bonds in comparison of the 3 bonds of G-T

Coding strand is the DNA strand that correspond to the mRNA which is used to translate into amino acids
The non coding strand is the one transcribed to get the mRNA
Transcription: complementary base and T replaced by U

Transcription in nucleus, translation outside nucleus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sequence Statistics standard format

A

FASTA
DNA made of A,C,G,T
elements si is nucleotide i
s(3:6) nucleotide 3 to 6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multinomial Sequence model

A

Nucleotides are independent
Sequence Generated randomly by probability distrubtion
P = (PA,PC,PG,PT)
Get proba of sequence: multiply the probability of each nucleotide to each other: position doesnt change anything!

The model doesnt fit because density plots show changes of frequency depending on the region (not a fixed proba): not independent! otherwise would be uniform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Nucleotide/ A-T C-G density plot

A

Nulceotide: see the fraction of each nucleotides at each point
A-T, C-G: see the fraction of each pair. can discuss CG content (methylation)
Window size makes it more or less smooth/noisy
Allows to check if there are changes in distribution across the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Markov Sequence Models

A

Different states our sequence can be in
Proba to change state and each state has a proba for each nucleotide occuring next

Transition matrix: how likely to move to another state from the current state
T= from A/C/G/T (current states: rows) to A/C/G/T (next state: columns)

proba of sequence => follow the sequence and the proba matrix, multiply all probas together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

K-mer frequency

A

Dimer: nucleotide word of length 2
Trimer: nucleotide word of length 3
K-mer: nucleotide word of length k
Can study if some kmers are more common than others
frequency matrix (for dimer have rows being the starting nucleotide and rows the second nucleotide, corresponding index is having that dimer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Odds ratio

A

Comparix observed and expected frequency
If have dimer xy
Observed P(xy) / [Expected P(x) * Expected P(y)]
Need frequency matrix of dimers and the frequency of each nucleotide (if you sum the row of of a nucleotide in a dimer frequency matrix you get the frequency of the individual nucleotide)
<1: less than expected
>1: more than expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nucleotide alphabet

A

ACGT
N: can be any base
R/Y/M: are other nucleotides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sequence Alignment

A

Predict function: align unknow sequence function to a sequence with known function
Sequence divergence: mutations
Gene finding: compare genomes of different species to locate genes. Most genes are conserved with a high similarity, rest is mutations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Homology, orthology, paralogy

A

Homology: descent from a common ancestor
Orthology: common ancestor by division (different species)
Paralogy: common ancestor by duplication

17
Q

Gene duplication

A

Gene can evolve without losing the original function