VL10 Genomics Flashcards
What is de novo assembly in genomics?
De novo assembly involves piecing together a genome from short DNA sequences without a reference genome, relying on overlaps of identical sequence regions.
What are the two main types of sequencing reads used in de novo assembly?
Long read data (e.g., Sanger, Pacbio) using overlap-layout-consensus
short read data (e.g., Illumina) using de Bruijn graphs.
What is the biggest challenge in de novo assembly?
The biggest challenge is dealing with repeats, such as transposable elements, centromeres, and telomeres, which can lead to fragmented assemblies.
What does a high-quality genome provide?
A high-quality genome serves as a reference for other genomes from the same or closely related species and is essential for analyzing genetic variation within species.
e.g. can be used as reference for re-sequencing
What is structural genome annotation?
Structural annotation involves identifying the locations of genes, repeats, regulatory elements, and non-coding RNAs within the genome.
What is functional genome annotation?
Functional annotation determines the function of genes and other genomic elements, often using data from related species and RNA-seq data.
What is re-sequencing in genomics?
Re-sequencing involves mapping sample reads to a reference genome to identify genetic variants and support genome annotation.
What are the benefits of transcriptomics/RNA-seq?
RNA-seq helps study:
* gene expression levels,
* alternative splicing,
* and provides support for genome annotation
by sequencing expressed genes.
What is a pangenome?
A pangenome includes genetic information from all sequenced individuals of a species, representing the genetic diversity within the species.
Not set how many genomes it is including
What are runs of homozygosity (ROH)?
- ROH are tracts of consecutive homozygous genotypes
- two copies of an ancestral haplotype are inherited from a recent common ancestor
- longer haplotypes from recent common ancestors
- shorter haplotypes from distant common ancestors.
What are the main methods used for sequencing genomes?
The main platforms include Illumina for short reads and Pacbio for long reads, each with different strengths and weaknesses for genome assembly.
What is the principle of the de Bruijn graph in genome assembly?
The de Bruijn graph breaks reads into k-mers and uses them to construct a graph where edges are k-mers and nodes are overlaps, aiding in assembling the target sequence.
Biggest probem: Duplkations
How are repeats in the genome a challenge for sequencing?
Repeats longer than the read length can lead to fragmented assemblies, making it difficult to accurately assemble the genome.
What is the role of paired-end and mate-pair reads in genome assembly?
Paired-end and mate-pair reads help link contigs into scaffolds by providing information about the relative positions of sequences, aiding in the assembly process
Paired-end reads: Short DNA fragments (200-800 bp) are sequenced from both ends, providing two reads per fragment. These reads help link adjacent contigs by providing information about the distance and orientation between them.
Mate-pair reads: Longer DNA fragments (2-5 kb or more) are circularized, cut, and sequenced from both ends, producing reads that are further apart than paired-end reads. These longer distance reads help link contigs over larger genomic regions, aiding in assembling repetitive and complex regions.
What is the importance of the N50 metric in genome assembly?
The N50 metric indicates the quality of the genome assembly, representing the length at which half of the genome is contained in contigs or scaffolds of that length or longer.
higher N50 value indicates a more contiguous and complete assembly, with fewer gaps and larger pieces of the genome accurately assembled.