Intro Flashcards

Question 1

Q

Data

Answer

A

Sequence Data:
Text strings using a limited alphabet, DNA/RNA(4 letters), Amino Acids (20 letters).
Can look genetic variations
Can measure gene expression (RNA sequencing)

Measurement Data:
Many different variables
All genes, many proteins, many metabolites

Question 2

Q

DNA, genes, proteins, metabolites

Answer

A

DNA (genome: genomics): chromosome double helix, 46 chromosomes, 23 pairs, 1 of each parent (in humans !! plants/animals don’t all have the same). Each cell contains 2 copies of DNA but eggs and sperm.
- Transcription -> RNA
RNA (genes: transcriptomics)
-Translation -> proteins (amino acids will fold into a specific structure for the protein)
Proteins: proteomics
- Catalyze metabolic reactions -> metabolites
Metabolites: metabolomics

Sequence data: genomics and transcriptomics
Numerical data: transcriptomics, proteomics and metabolomics

Question 3

Q

Eukaryotic / Prokaryotic

Answer

A

Eukaryotic: nucleus, chromosomes
Prokaryotic: no nucleus, simple

Question 4

Q

Proteins

Answer

A

Cell signaling, catalyze metabolic reactions, transport metabolites, antibodies,…
-> they can bind on sites and chemically change. Their shape indicates the corresponding binding site

Question 5

Q

Metabolites

Answer

A

Chemical compounds in the body

endogeneous: produced by the organism
exogeneous: from outside the organism

Question 6

Q

Genome

Answer

A

set of all DNA contained in a cell
Double stranded DNA: complementary, can restore the other strand from one another
eukaryote: linear dna
Prokaryote: circular dna
viral: variable

Mitochondrial DNA is only inherited from the mother

Question 7

Q

DNA

Answer

A

4 bases/nucleotides
Complementary strands

Adenine, Guanine, Thymine, Cytosine
A-T and G-T
A-T is weaker because they only have 2 hydrogen bonds in comparison of the 3 bonds of G-T

Coding strand is the DNA strand that correspond to the mRNA which is used to translate into amino acids
The non coding strand is the one transcribed to get the mRNA
Transcription: complementary base and T replaced by U

Transcription in nucleus, translation outside nucleus

Question 8

Q

Sequence Statistics standard format

Answer

A

FASTA
DNA made of A,C,G,T
elements si is nucleotide i
s(3:6) nucleotide 3 to 6

Question 9

Q

Multinomial Sequence model

Answer

A

Nucleotides are independent
Sequence Generated randomly by probability distrubtion
P = (PA,PC,PG,PT)
Get proba of sequence: multiply the probability of each nucleotide to each other: position doesnt change anything!

The model doesnt fit because density plots show changes of frequency depending on the region (not a fixed proba): not independent! otherwise would be uniform.

Question 10

Q

Nucleotide/ A-T C-G density plot

Answer

A

Nulceotide: see the fraction of each nucleotides at each point
A-T, C-G: see the fraction of each pair. can discuss CG content (methylation)
Window size makes it more or less smooth/noisy
Allows to check if there are changes in distribution across the sequence

Question 11

Q

Markov Sequence Models

Answer

A

Different states our sequence can be in
Proba to change state and each state has a proba for each nucleotide occuring next

Transition matrix: how likely to move to another state from the current state
T= from A/C/G/T (current states: rows) to A/C/G/T (next state: columns)

proba of sequence => follow the sequence and the proba matrix, multiply all probas together

Question 12

Q

K-mer frequency

Answer

A

Dimer: nucleotide word of length 2
Trimer: nucleotide word of length 3
K-mer: nucleotide word of length k
Can study if some kmers are more common than others
frequency matrix (for dimer have rows being the starting nucleotide and rows the second nucleotide, corresponding index is having that dimer)

Question 13

Q

Odds ratio

Answer

A

Comparix observed and expected frequency
If have dimer xy
Observed P(xy) / [Expected P(x) * Expected P(y)]
Need frequency matrix of dimers and the frequency of each nucleotide (if you sum the row of of a nucleotide in a dimer frequency matrix you get the frequency of the individual nucleotide)
<1: less than expected
>1: more than expected

Question 14

Q

Nucleotide alphabet

Answer

A

ACGT
N: can be any base
R/Y/M: are other nucleotides

Question 15

Q

Sequence Alignment

Answer

A

Predict function: align unknow sequence function to a sequence with known function
Sequence divergence: mutations
Gene finding: compare genomes of different species to locate genes. Most genes are conserved with a high similarity, rest is mutations.

Question 16

Q

Homology, orthology, paralogy

Answer

Study These Flashcards

A

Homology: descent from a common ancestor
Orthology: common ancestor by division (different species)
Paralogy: common ancestor by duplication

Question 17

Q

Gene duplication

Answer

Study These Flashcards

A

Gene can evolve without losing the original function

Intro Flashcards

(17 cards)