W8L1 Genomic: genome assembly and genetic markers Flashcards

1
Q

Some problem with initial genome assembly

A
  • Shotgun sequencing does not define a genome. Which requires assembly and annotation.
  • Genome sequence does not provide all the information. As chromatin conformation, epigenetic states are important too.
    -A single genome does not represent the whole species
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is genome information important

A

Genome information is highly valuable at defining a species and gaining information on the structures and function of the organism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is genetic marker used as landmark

A
  • to assess genetic diversity
  • to gain positional information about genetic basis of traits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The thousand genome project

A

From one genome to 1000 genomes, increasing appreciation of
the importance of variation
Mostly re-sequencing shotgun reads from many individuals and align the reads against reference genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sequencing the genome

A

Shotgun sequencing using high throughput short read technology
Short random fragments of DNA are sequenced across the genome to a given depth of coverage.
Fragments can consist of
- Single reads (typically 50–1000 bp)
- Paired-end reads of varying insert size (note that paired-end reads can overlap).
- Mate-pair libraries span larger genomic regions ( 2–20 ∼ kb inserts) with reads generally facing outwards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assembling a genome

A

First define contigs (mate pair) based on sequence overlap represented in a de Bruijn graph
Second scaffold the contigs using large insert or long read technology
- Intergrating the scaffold is into maps, using either physical map such as BAC or genetic maps: linkage maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Annotating a genome

A

-using gene prediction models, expression data, homologous protein identification(from database)
-Final model combine multiple source of evidence
-comparing the annotated gene set against biological expectation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Some problem with gene annotation

A

With increasing biological complexity, genomic size becomes a bad proxy for gene number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Genome complexity in terms of repetitive elements

A

The kinetics of the reassociation of fragments of E. coli and bovine DNAs is a function of the initial concentration of DNA multiplied by the time of incubation.
- The E. coli DNA reassociates at a uniform rate, consistent with each fragment of DNA being represented once.
- The bovine DNA fragments exhibit two distinct steps in their reassociation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Coupling chemical or enzymatic treatment with sequencing

A

Bisulfite conversion
-Non-methylated cytosine is converted to uracil.
-Comparing treated vs non treated DNA samples identifies differentially methylated cytosines
-Chromatin Immuno-precipitation (ChIP) An antibody specific of a protein of interest (eg: histone) is used to isolate specific DNA subfraction interacting with the protein
Assay for Transposase Accessible chromatin (ATAC)
Modified Tn5 transposase coupled with
sequencing adapters targets accessible open chromatin segments of DNA sample
-High-throughput Chromosome Conformation Capture (Hi-C)
Capture fragments in close proximity in DNA sample by establishing cross-linkage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a genetic markers

A

Genetic markers are simpler genetic landmarks used as a proxy for more complex and less accessible causal source of variation.
Explicitly relies on Linkage Disequilibrium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a good marker

A
  • Polymorphic and Abundant
  • Unambiguous/Repeatable
  • Neutral (Not causal)
    + Co-dominant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is SNP use as a reference marker

A
  • Most abundant
  • Easy to genotype
    either using Microarray Or direct sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Genetic marker coverage: depth vs breadth

A

There is two dimension to genomic data:
- coverage (completeness)
- depth (accuracy)
Genome sizes and complexities for non-model species often limit whole-genome sequencing approaches, forcing the use of:
- Reduced representation
- targeted or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

genetic marker for population genetic: absence of prior knowledge

A

For population level information:
Use of Restriction-Assisted DNA Sequencing
Increase the depth of coverage for a set of neutral markers
Accurate estimation of frequencies
Random coverage of the genome (no linkage with specific loci)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Genetic markers for population genetic with prior knowledge

A

-use base-base capture/ exon capture to hybridize know sequences
-insert into next generation sequence and computational pipeline
-increase depth at the selected loci
-no capture of unknown or highly divergent alleles