Genome Sequencing Methods Flashcards

Question 1

Q

Physical mapping

Answer

A

BACs= bacterial artificial chromosomes
cloning vectors for larger pieces of DNA
physical mapping- provides path of minimally overlapping clones along chromosome

Question 2

Q

BAC

Answer

A

physical mapping
bacterial artificial chromosomes
Bac ends sequenced to provide landmarks in genome
BAC clones can be sequenced individually (clone by clone) – now outdates and done by whole genome shotgun method

Question 3

Q

Shotgun sequencing

Answer

A

whole genome sequencing

fragment whole genome into 1-3kb pieces
clone fragments into vectors (make a library)
sequence clones from both ends (parallel end seq)
align using computational methods
individual reads aligned into Contigs. Contigs into scaffolds
many gaps often remain
can be problematic for large genomes with many TEs that are similar and repetative DNA
fixed by genome annotation

Question 4

Q

Contigs

Answer

A

individual reads that can align together

- contigs linked together by forming scaffolds

Question 5

Q

scaffolds

Answer

A

connecting contigs together

Question 6

Q

Issues with shotgun sequencing

Answer

A

many gaps remain

large genomes can have many TEs that are similar to each other and repeatative DNA = misalignment etc

Question 7

Q

Genome annotation

Answer

A

After shotgun sequencing
gene finding programs (FGENESH and GeneScan)
- repeat masking done to hide repeated regions
- exon/intron structure predicted by programs
-transcriptome comparisons to find expressed genes and intron/exon junctions
-putative functions- found by comparisons to related species
-detection and annotation of Non-coding RNA and TES

Question 8

Q

Next generation sequencing

Answer

A

“2nd generation”
Ultra high throughput
-genome sequencing much faster
- 454, illumina, ion torret
- millions of sequence reads obtained per run instead of 96 or 384, with conventional sanger
- 454- sequencing by synthesis - no longer used
- illumina sequencing - higher throughput than 454
– 100-150bp short end (often paired end) reads
– used for re-sequencing to look for polymorphisms
–now (not often) used for de novo but challenges for assembling short reads – needs to be paired with other technologies
-ion torret= intermediate btw 454 and illumina, reads comparable to 454

Question 9

Q

Single molecule sequencing

Answer

A

3rd generation
- using DNA polymerase
- does not require amplification of template
- SMRT sequencing and Oxford Nanopore’s Minion
– long reads (2400-4000 bp)
moderate throughput- like 454 technique
-good for De novo sequencing - long reads= easier assembly
-sometimes can be paired with others like illumina to correct for errors

Question 10

Q

Resequencing of genomes

Answer

A

multiple lines, cultivars, accessions, ecotypes with sequenced reference genomes can be sequenced
- find variants etc

Question 11

Q

Genotyping

Answer

A

use of DNA data to analyze relationships in or among populations

GBS (genotyping by sequencing) and RAD sequencing
using illumina sequencing of restriction digested DNA, using barcoding to sequence many samples in lane
used in population and evolutionary genetic studies
previously AFLPS and microsatellites were used but not as efficient
data allow for analysis of SNPs among individuals

Question 12

Q

Main purposes for RNA-seq

Answer

A

transcriptome sequencing
- reference transcriptome RNA-seq to obtain a set of reference transcripts
Expression profiling
- compare gene expression levels in 2 or more samples from RNA-seq data

Question 13

Q

Illumina

Answer

A

millions of 100-150bp reads
sometimes paired end
useful for expression profiling: high depth sequencing
de novo transcriptome ref sequencing but challenging with short reads
- normalize reads to length of gene– more reads higher mRNA expression

Question 14

Q

454 and Ion torret

Answer

A

Longer reads than illumina, easier to align together or to reference genome
very useful for reference transcriptome sequencing
not as widely used for expression profiling studies as illumina (read depth lower)

Question 15

Q

PacBio SMRT and Oxford nanopore’s minion

Answer

A

long reads sequence entire transcripts
very useful for reference transcriptome sequencing
allows sequencing of isoforms

Question 16

Q

RNA seq aplications

Answer

A

new gene discovery
profiling of tissue/organ types, diseased vs wt, mutant vs wt, effects of stress/pathogens, etc.
-discovery of alternative splice sites
-expression profiling and discovery of miRNA, and siRNAs
- analyzing TF binding sites with Chromatin immunoprecipitation followed by RNA-seq

Question 17

Q

Bioinformatics

Answer

A

study of biological information using concepts and methods from computer science and stats

algorithm and program development
genome database development
computational analysis of high throughput DNA sequence and expression data to answer biological questions

Question 18

Q

blast searching

Answer

A

BLASTn= nucleotide to nucleotide
BLASTx= nucleotide to protein database
BLASTp= protein to protein
tBLASTn= protein to nucleotide database translated into all possible reading frames
tBLASTx= nucleotide to nucleotide translated to all possible reading frames-- slowest

Question 19

Q

E-value

Answer

A

expectation value

lower/closer to 0 is best, 0.1=worst
represents significance of each hit
defined as number of hits one can expect to find by chance when searching a database of particular size

Question 20

Q

few ways in which genome size effects organism

Answer

A

nucleus size, cell size
duration of cell cycle
cell differentiation rate
metabolic rate
embryotic developmental rate
life history strategy
invasiveness
extinction rate

Question 21

Q

4 ways TE insertions can negatively impact host

Answer

A

energetic costs of replication, transcription, translation
disrupts cellular processes by TE proteins
susceptibility to harmful GOF mutations
deleterious rearrangements caused by ectopic recombination

Question 22

Q

repeatMASKER

Answer

A

program that detects and filters out repeated sequences in genomes using sequence similarity to known set of repetitive sequences
- only as good as reference genome

Question 23

Q

why is homozygosity important for genome sequencing?

Answer

A

facilitates the assembly of the genome and only 1 copy is required as don’t have to deal with allelic variants
- implications for putting together genome

Question 24

Q

when assembling a genome why is the percentage of genes higher than the total amount in the assembled genome

Answer

A

reads are overlapping due to many different types of sequencing– over estimation of genes

genes are easier to find
repeatative regions hard to asssemble