13 - Whole genome analysis Flashcards

1
Q

How has illumina sequencing changed recently?

A

Used to be that you could only do reads of 50 to 300 base pairs. Thanks to paired ends you can now do 600 (2x300)

The equipment is cheaper and you don’t need as high a concentration of DNA anymore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why shouldn’t you want to find the sequence of a genome just for the heck of it?

A

Only annotated genomes are useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Most genomes are annotated automatically, what’s a drawback to this?

A

Humans make the highest quality annotations. But this just isn’t realistic for most cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give some general steps of annotating a genome

A
  • Generate ORFs from a completed sequence
  • Do a homology search for different factors (metabolic pathways, frameshift detection, gene families etc.)
  • Combine the above search with more specialized searches (eg. DNA motifs, regulatory elements and repetitive sequences).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the two most common approaches to define gene structure?

A

Prediction based (ab initio): algorithms designed to find genes/gene structures base don nucleotide sequence and composition

Sequence similarity (evidence driven): alignment to mRNA sequences (ESTs) and proteins from the same species or related species; identification of domains and motifs.

These are often done in combination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is prediction based (ab initio) gene structure searching?

A

Using software (gene predictors) that use mathematical models created from the accumulated knowledge about gene structure on a particular type of organism. They provide a rapid way to conduct a preliminary analysis of raw genome data but have low accuracy and can’t deal with alternative splicing and other complex situations. They are more effective on prokaryotic genomes.

This uses what is already known about the gene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe evidence driven gene prediction

A

Utilizes external data to find genes and determine their precise boundaries and features, such as introns and alternative splicing patterns. Their ability to produce accurate gene models depend on the nature and quality of the data available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What sort of elements might an ab initio prediction based search look for in a eukaryotic genome to find genes?

A
  • Promoter regions
  • 5’ UTR
  • initial exon
  • introns
  • Protein coding regions
  • 3’ UTR
  • poly-A tail (in mRNA)
  • Intergenic DNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is RepeatFinder / RepeatMasker?

A

A tool that can look for many kinds of repeated sequence in a raw genomic dataset.

It uses a comprehensive database of repeated DNA.

It can label (annotate) regions of repeats and mask them to exclude them from further analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is tRNAScan?

A

A tool that can look for potential tRNA genes and annotates them into the genome sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is gene ontology and how can it be used for genome analysis?

A

The Gene Ontology is a controlled vocabulary of terms to describe gene product characteristics in the domains of localization and function.

They can be used to classify genes into functional categories (eg. metabolism, stress related, immunity etc.) as well as their location of expression (kidney, liver etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Kyoto Encyclopedia of Genes and Genomes (KEGG)?

A

A program which can analyse data to assign metabolic pathways to the genes within the genome of the organism being studied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly