Genomics & NCBI Accessions Flashcards
Problems with sequencing
Data is not always perfect
Each region needs to be covered about 10 times
20% of the reactions fail
Ex. Takes 400 days to cover the human genome
How many base pairs is the human genome?
3.3x10^9 bp
What is the basis of whole genome shotgun sequencing?
Sequence first, map later
Genome is cut into small pieces and sequenced
Overlapping regions are put in order
When does shotgun sequencing work well?
Small bacterial genomes that don’t have much repetitive DNA
When sequencing different individuals of the same species
What is done when gaps are encounters in shotgun sequencing
PCR primers are made from both ends of the contigs that cover gal region
Gap is amplified by PCR
PCR product is sequenced directly
Works well for small gaps
What is done for large gaps that can’t be bridged by PCR
Are cloned into a low copy clone vector like BAC with an F origin of replication
Each end of the BAC is sequenced-paired ends-until the gap is covered
Why is repetitive DNA a problem for genome sequencing
If a repeat is sequenced, there is no way to tell which part of the genome it came from
Unique flanking region is not included in sequenced fragment
Ex: LINES- long repeat units
Not sure effective for many eukaryotes
What is the basis of the ordered clone approach
Map first, sequence later
How is map first, sequence later done?
Physical map that contains ordered clones is produced first
Large fragments of genomic DNA are cloned into BAC vectors
Determine if BAC have overlapping sequences
Pick a minimal tiling path: least number of BAC that cover a specific path
Each BAC is sequenced using shotgun approach
BAC sequences are assembled into long contigs
What percent of the genome is transposable elements?
45%
What percent of the genome is exons?
3%
What was the purpose of the 1000 Genomes Project?
to find the extent of genome variation among individuals
What is the ENCODE project doing?
- find which regions of the human genome are transcribed into RNA and bind to TF
- understand chromatin structure
What is the goal of the ENCODE project?
to characterize all noncoding DNA in the genome
Define transcriptome.
-where a gene is expressed