Lecture 2 - Biology for Computational Genomics Flashcards

1
Q

What are the two forms of computational formulation?

A

-numerically encoded input with a computable objective function
i.e. f(x)

-numerically encoded input with a computable test for significance
i.e. probability that nucleotides are i.i.d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the rule of maximum parsimony?

A

-if there are multiple solutions to a problem the one involving the fewest steps is the biologically correct one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three steps for the computational formulation of genome rearrangements?

A
  1. transform the problem into numeric input
  2. define the model
  3. define the problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the repeat structure in the human genome?

A

the human genome is highly repetitive and 50% of the human genome is repeats which excludes the centromeres and when you add them you get the 55% range
-the repeat structure of genomes effect computational problems such as genome assembly and read alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are STRs or short tandem repeats?

A

a repeated monomer sequence that tends to be 2-7 bases repeated up to several hundred times is 4.3% of genome and the mutation rate is very high for STRs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the societal and biological importance of STRs?

A

-13 STR loci are used as a forensics database
-STR expansions are linked to disease such as ALS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are mobile elements?

A

mobile DNA are sequences that copy themselves or hijack reverse translation systems
-pieces of DNA that have info encoded in them so the cell can copy them and put them back into the gene
-autonomous sequences that encode the protein that copy themselves and encodes for proteins that copy that very sequence of DNA is a positive feedback loop and are often hyper methylated which prevent this from going out of control - nonautonomous get copied into RNA and spliced back via an autonomus element so cannot do on own

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are segmental duplications?

A

a sequence that is at least 1kb in length that is not a mobile element or a tandem repeat that is duplicated with at least 90% identity elsewhere in the genome
-is mosaic and has multiple overlapping regions in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are segmental duplications drivers of evolution and disease?

A

-drive nonallelic homologous recombination which results in the duplication or deletion of a region which occurs between two sequnces of dna that are highly similar but are not alleles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the classes of repeats in human genome?

A

STRs (2-6)
-variable number repeats (7-500bp)
-mobile elements (300-8,000bp)
-segmental duplications (1,000bp-1,000,000bp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the binary representation of DNA?

A

A=00
T=11
G=10
C=01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some binary conversions?

A

one base = two bits
one byte = eight bits
one byte = 4 bases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

memory size

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the organization of DNA?

A

untranslated region to open reading frame to other untranslated region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where does RNA splicing occur the most?

A

GU/AT sequences of introns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a gene model?

A

set of isoforms known to exist for a gene

17
Q

What can RNA form to influence splicing?

A

a secondary structure that can influence splicing a hairpin

18
Q

What do LOGO plots show?

A

the relative frequency of nucleotides at binding motifs

19
Q
A