Secondary Analysis Flashcards
What is a read?
An inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment, typically 150 bp in length.
What is de novo sequence assembly?
Assembling of short nucleotide sequences into longer ones without the use of a reference genome.
What are mapped reads?
Those reads from the sequenced sample that align directly to a single region (set of loci) on the reference genome.
What are unmapped reads?
Those reads that map nowhere on the reference genome.
What is BLAST?
Basic local alignment search tool. It is an algorithm and program for comparing primary biological sequence information. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence (called a query) with a library or database of sequences, and identify database sequences that resemble alphabet above a certain threshold.
What is a FASTA file?
A text file for representing nucleotide or amino acid sequences where nucleotides or amino acids are represented by single-letter codes.
What is sequence alignment?
A way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.
What is a consensus sequence (canonical sequence)?
It’s the calculated sequence of most frequent residues (nucleotide or amino acid) found at each position in a sequence alignment.
What is a sequence motif?
A nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function.
What is a FASTQ file?
A text file for representing a biological sequence and its corresponding quality scores.
What is a SAM file?
Sequence Alignment Map, a text-based format, originally for storing biological sequences aligned to reference sequence. Now it’s extended to also represent unmapped sequences.
What is a BAM file?
It’s a binary compressed format equivalent to text-based SAM format.
What is a CRAM file?
Compressed Reference-oriented Alignment Map. A compressed columnar file for storing biological sequences aligned to a reference sequence.
What is a library?
It’s the DNA product extracted from biological samples and prepared for sequencing.
What is cDNA?
Copy DNA or complementary DNA. It is synthetic DNA that has been transcribed from a specific mRNA through a reaction using the enzyme reverse transcriptase. While DNA is composed of both coding and non-coding sequences, cDNA contains only coding sequences.