Shane - Lecture 3 Flashcards

1
Q

How many genes do human have?

A

22,000 genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How long did it take to sequence the full human genome?

A

About 10 years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How much of your genetic material is the exact same as a random stranger?

A

99% of it is identical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why did it take so long to sequence the human genome?

A

Because we have 3 billion base pairs but only 22,000 genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is computational gene prediction?

A

Trying to find what genes are found on a sequence of DNA i.e. what region of the uncharacterised sequence codes for proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What information can be found via computational gene prediction?
(6)

A

What regions codes for protein

Which DNA strand encodes the gene

Which reading frame is used

Where does the gene start and end

Where are the exon-intron boundaries in eukaryotes

Where are the regulatory sequences for that gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What often acts as the start codon?

A

ATG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the benefits of gene finding on prokaryotes?
(3)

A

Small genomes

High coding density

No introns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the gene level accuracy of gene finding of prokaryotes?

A

99%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the characteristics of eukaryotic genes?

A

Large genomes

Low coding density

Intron/exon structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the gene level accuracy of gene finding on eukaryotic genes?

A

About 50% accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the problems associated with gene finding on prokaryotes?
(3)

A

Overlapping open reading frames

Very short genes - protein might be only a few dozen amino acids

Finding transcription start sites (TSS) and promoters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a TSS?

A

The point at which RNA polymerase starts trascribing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a TSS?

A

The point at which RNA polymerase starts transcribing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the four ways we can predict the location of genes in genomic sequences?

A

Searching by signal

Searching by content

Similarity-based methods

Comparative genomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is it called when searching by signal and content is done simultaneously?

A

Ab initio or intrinsic methods

17
Q

What are intrinsic methods of gene prediction used for?

A

For looking for very specific features associated with genes

18
Q

What is it called if similarity-based methods and comparative genomics are used together?

A

Extrinsic methods

19
Q

What is meant by searching by signal gene prediction?

A

The analysis of a sequence signal involved in gene specification

20
Q

What is meant by searching by content signal gene prediction?

A

Codon bias correlated with coding regions

21
Q

What is meant by similarity based methods of gene prediction?

A

Use of similarity to known annotated sequences

22
Q

What is meant by comparative genomics?

A

Aligning genomic sequences from different species

23
Q

What is meant by extrinsic methods of gene prediction?
(2)

A

Is our unknown gene similar to other known gene sequences

This relies on pre-existing gene information

24
Q

How does ab initio gene finding work?
(4)

A

We input a DNA string of letters (A, C, G, T)

We get out an annotation of the string of letters showing for every nucleotide whether it is coding or non-coding

Red = stop and start codons
Blue = exons
Black = introns

Identifies coding exons of protein-coding genes

25
Q

Give an example of one of the most common stop codons

A

TAA

26
Q

How does searching by signal work?

A

There are four different signal found at different sites:
- translation start codon ATG
- 5’ splice donor site
- 3’ splice acceptor site
- translation stop codon - TAA, TAG, TGA

27
Q

List the three stop codons

A

TAA
TAG
TGA

28
Q

What can be used to look up the donor and acceptor splice sites of a sequence?

A

Consensus sequences can be used to find splice sites

29
Q

What can be used to help identify a stop signal?

A

The Cs and Ts found running up to the stop

30
Q

What does searching by content do?

A

Accurate prediction of exons dependant on content-based features -> can identify the type of exon

31
Q

What are the three types of exons?

A

Initial exons

Internal exons

Terminal exons

32
Q

What are initial exons?

A

Open reading frames delimited by a start site and 5’ donor site

33
Q

What are internal exons

A

Open reading frames delimited by a 3’ acceptor site and 5’ donor site

34
Q

What are terminal exons?

A

Open reading frames delimited by 3’ acceptor site and stop codon

35
Q

Where is codon bias mostly found?

A

Found in exons more so than introns

36
Q

What is codon bias?

A

The uneven usage of amino acids -> some are more frequently found and some are not

37
Q

How can codon bias be useful?

A

It can be used to differentiate between coding and non-coding regions as some codons might only be found in coding regions etc

38
Q

What are coding statistics?

A

A function that for a given DNA sequence computes a likelihood that the sequence is coding for a protein