Ch 2.1 Flashcards
Genome
All DNA content/information in a haploid cell.
Bionformatics
The information content of genomes
Structural genomics
An assembly of contiguous stretches of (chromosomal) DNA. (getting the DNA and sequencing, and put together into something that can be read)
Functional genomics
Characterize the role (level of expression, biological function) played by transcripts and proteins. (what does this tell me? ABCD? what’s the launguge, like reading a book)
Comparative genomics
Comparing genomes from different organisms (sequence conservation, nucleotide composition bias). ( learn lessons from comparing genomes, conservation? mutations not allowed here over millions of years. What are things that are rapidly changing?)
Hierarchical genome shotgun (HGS)
Use random markers, Organize (map) segments of DNA, Choose minimum number of overlapping clones (tiling path), Sequence (use random markers to make map, then sequence minimum number to make whole thing.)
Whole genome shotgun (WGS)
Fragment the entire genome into pieces, Sequence all pieces, Assemble all pieces into contigs that span each chromosome
Problem with Contigs
IMPORTANT: repetitive DNA/replication: if don’t have a map and have two repetitive sequences you’ll end up putting them together so there will be a GAP in the spot it’s suppose to go (true assembly/position in sequence)
Fingerprints obtained?
RFLP: Restriction fragment polymorphism [clone 1 overlaps clone 2 can put them together because there’s pieces that they share and pieces that are unique]; STS: Sequence-tagged sites [STS is an easy to amplify (PCR) DNA sequence that produces a simple and reproducible pattern on a gel. Each STS marker defines a unique site in the genome whose presence within a clone can be detected by PCR ]
WGS or HGS approach
Both methods rendered similar results. but the WGS method is much faster and cheaper. but when two genome regions are very similar in sequence, the WGS project tends to lump them together. The lost WGS DNA contained 103 genes, including 5 known disease genes Recommendation: Use a hybrid approach. Phase 1: WGS to produce six fold coverage Phase 2: Map and produce a mini tiling path of duplicated or repetitive DNA regions for sequencing
What should be sequenced?
Five main criteria : Medical applications ,Evolutionary significance, Environmental impact, Food production, Cost
Coverage
It is the average number of times a base is sequenced in a genome project
FASTA
is a universal file format used to report sequences. A FASTA file has a description header starting with >. It is followed by the actual sequence
Quality
is measured by a score that reflects the number of times a base was identified by the automated sequencer and the quality of the chromatogram for each base (i.e. peak height and even spacing)
Trace
is a four coloured graph produced by the sequencer (chromatogram)