Vocabulary Words For Exam 1 Flashcards
Heteroplasmy
The situation in which two or more mtDNA exists within the same cell
Histones
any of a group of basic proteins found in chromatin
Heterochromatin
chromosome material (compacted) in which the activity of the genes is modified or suppressed
Euchromatin
Chromosome material that is active
Nucleosome
the fundamental repeating unit of chromatin
Constitutive heterochromatin
the DNA sections of the chromosomes that remain constant throughout the cell cycle
Centromeres
the region of a chromosome to which the microtubules of the spindle attach during cell division
Telomeres
a compound structure at the end of a chromosome
Segmental duplications
blocks of DNA that typically share more than 90% sequence identity and occur at more than one site within the genome
Phosphodiester bonds
a chemical bond of the kind joining successive sugar molecules in a polynucleotide
Polarity
Direction
DNA polymerase
a type of enzyme that is responsible for forming new copies of DNA, in the form of nucleic acid molecules
Semiconservative
during DNA replication, the two strands of the nucleotides separate and each daughter strand has half of the original DNA
Base Pair
bp 1
Kilobase
Kb 1,000 (1,000 bp)
Megabase
Mb (1,000,000) (1000 Kb)
Gigabase
Gb 1,000,000,000 (1000Mb)
Terabase
Tb 1,000,000,000,000 (1000 Gb)
Petabase
Pb 1,000,000,000,000,000 (1000 Tb)
Denaturation
the unfolding or breaking up of a protein, modifying its standard three-dimensional structure
Annealing
process of joining of single-stranded DNA or RNA with hydrogen bonds to form a double-stranded polynucleotide
Extension
using the loosened nucleotides of each base to grow the complementary DNA strand
PCR
Polymerase Chain Reaction
a lab technique for rapidly producing millions to billions of copies of a specific segment of DNA
Regulatory Promotor
Upstream of core promoter
the binding site for the transcription apparatus
Transcription factors
proteins that help turn specific genes ‘on’ or ‘off’ by binding to nearby DNA
Enhancers
distal locations can also enhance transcription
Gene
fundamental unit of heredity
TSS
transcription start site
5’ cap
put on the 5’ side used to protect the end
3’ poly (A) Tail
line of As put on the 3’ end to protect that end
Degenerate
repetative
Synonymous
change in DNA seq does not change AA
Wobble
the 3rd base of the codon can vary
Non-synonymous
change in DNA seq changes AA
Nonsense
change in DNA is a stop codonA
AUG
start codon
methionine
Reading frame
code is read in threes
Template
DNA used as input to create a library
Library
temple DNA that has undergone all the manipulations to enable it to be sequenced
Adapters
specific sequence added to the 5’ and 3’ ends of template DNA that are complementary to the oligos on the flow cell
Indexes/barcodes/tags
A unique DNA sequence ligated to fragments within a sequencing library for downstream in silico sorting and identification
Flow cell
a glass slide with one, two, four, or eight physically separated lanes. Each lane is coated with a lawn (nano-well) of surface bound, adapter-complimentary oligos
Bridge amplification
the process where a single molecule is amplified to form a cluster
Cluster
clonal grouping of template DNA bound to the surface of a flow cell
Fluor
chemical structure that emits light at a certain wavelength when excited by laser
SBS
sequencing by synthesis
one nucleotide is added at a time
biological nanopores
alpha-hemolysin is a heptameric protein pore with an inner diameter of a few nm, 100,000 times smaller than a human hair
Homopolymers
same sequence of multiple nucleotides
can be an error point because it does not change the chemical structure and the machine is not sure how many to read
ZMWs
zero-mode wave guides
nano-wells used in PacBio
SMRT sequencing
single molecule real time
indels
an insertion or deletion of bases in the genome of an organism
FASTA
standard file format used when quality is not needed
presents only the sequence itself
FASTQ
standard file format used when quality is needed
contains the sequence and the estimated base quality
QPhred
-10log10 P(error)
Alignment
a set of columns, each containing a set of bases that are all related to each other by some alignment relation
Smith-Waterman
one of two main algorithms
find the optimal local alignment
good for Sanger but too slow for high throughput
Burrow-Wheeler Transform
creates suffix array of smaller k-mers
match the seed of the read to the reference
extend seed to full alignment
BWA (WGS) and STAR (RNA)
most used aligner
Minimap
long reads alignment algorithm
optimal local alignment
an alignment giving the highest score
variants
differences between our sample and the reference
SNPs
single nucleotide polymorphisms
SAM
sequence alignment map
11 mandatory fields
BAM
binary (compressed) version of SAM
8 mandatory fields
Variant call format
a standard text file used in bioinformatics for storing gene sequence variations
BCF
binary of VCF
BaseQualityScoreRecalibration (BQSR)
uses a set of known variant positions and considers all other variants from sequence data as errors
builds a model based on several parameters
Samtools mpileup
variant caller-not used much anymore
provides a summary of the coverage of mapped reads on a reference sequence at a single base pair resolution
GATK HaplotypeCaller
Most used variant caller
call potential variant sites per sample and save results in GVCF format
Single sample variant calling
only shows one sample
Joint variant calling
shows multiple samples
is preferred
Hard filtering
based on multiple metrics and filters variants
sensitivity
true positive rate
measures the proportion of positives that are correctly identified
(proportion of those who have some condition who are correctly identified as having the condition)
Specificity
true negative rate
measures the proportion of negatives that are correctly identified
(proportion of those who do not have the condition and are correctly identified as not having the condition)
VariantQualityScoreRecalibration
similar to BQSR but for variants
truth set
a list of variants that is used to evaluate the quality of a variant callset
Tranches
slices of variants, ranked by VQSLOD
90 tranche
few variants but also few false positives
miss true positives but don’t have as many false positives
100 tranche
many variants but also many false positives
capture true positives but have many false positives
High tranche
if you want more variants and are willing to accept false positives
Middle Tranche
if you want to remove most false positives but are also willing to remove some true variants
low tranche
if you only want highly accurate true variants with few false positives and willing to miss perhaps many true positives