DNA Basics Flashcards
DNA stands for
deoxyribonucleic acid
number of base pairs in nuclear genome
~3 billion
base pairs in mt genome
~16 kb (100s-1000s in cells)
nucleotide composition
sugar (deoxyribose), phosphate on 5’ of sugar, nitrogenous base on 1’ of sugar
nucleoside composition
sugar and nitrogenous base
purines
A and G (double ring)
pyrimidines
T and C (single ring)
DNA backbone
phosphate (5’) - sugar (3’)
G pairs with what? with how many bonds?
C, 3
A pairs with what? with how many bonds?
T, 2
what type of bond holds double helix together?
hydrogen
which bases are more prevalent in gene rich areas?
GC
nucleosome
147 bp wrapped around 8 histones, plus some linker DNA to next nucleosome
solenoid
6-8 nucleosomes per turn. fifth histones bound to linker segments in middle
coding DNA amount
(produces protein) ~1.2% of genome. ~20,000 genes
does number of genes correspond to chromosome size?
no
what percent of nuclear genome is highly conserved?
~5%
five ways to get DNA duplication
unequal crossover (homologs or sister chromatids), transposons, ancestral cell fusion, genome duplication, translocation
retrotransposon
uses a reverse transcriptase
DNA transposon
migrates without copying. Just excised and reinserted elsewhere
LINES
autonomous transposons. Retrotransposons. 20% of genome. Usually integrate into gene poor areas
SINES
non autonomous retrotransposon. Alu is a SINE- 10% of genome, most abundant sequence
Satellite DNA
high copy number tandem repeats
mini-satellites
10-60 bp repeats, up to 20 kb
micro-satellites
1-4 bp repeats, up to 1 kb
nonprocessed pseudogene
contains introns, UTRs, etc. Matches full gene sequence. normally found near functional gene.
processed pseudogene
only contains coding sequence
is mt DNA more or less prone to error than nDNA?
more error prone
characteristics of mtDNA
highly conserved, no introns, ~66% coding, circular, ds (except for triple stranded loop), 37 genes (mostly tRNAs and oxydative phosphorylation proteins)
most proteins in the mitochondria are coded in?
the nucleus
why do we call dna replication semi conservative?
two newly synthesized molecules contain one strand from the original, and one new
replication forks
replication complex binds to origin of replication (many per chromosome). Replication proceeds in both directions from origin
what does DNA polymerase require? How does it get it?
a free 3’ OH on a ds molecule. RNA polymerase makes a small primer so DNA polymerase can start
how does cell replicate lagging strand?
Okazaki fragments. 100-1000 bases. Added as fork opens. Ligated together.
what proteins are needed for DNA replication?
topoisomerase, ligase, helicase, DNA polymerase, primase, ss binding proteins
is replication of mt DNA uni or bidirectional?
uni
what is replicative segregation in mitochondria?
during mitochondrial division, multiple copies of mtDNA replicate and sort randomly. During cell division, multiple mitochondria sort randomly.
telomere composition/construction
repeats of TTAGGG. G-rich 3’ overhang folds back to create a T-loop.
telomerase
TERC serves as an RNA template for telomeric repeats. TERT is reverse transcriptase that adds the bases. T-loop still created
how a ribonucleic acid differ from deoxyribonucleic acid?
RNA has an OH at 2’ of sugar
how is U different from T?
T has CH3
c-value paradox
gene number does not correspond to complexity
what percent of human genome is highly conserved?
~10% (including mtDNA?)
definition genotype
genetic constitution
phenotype
chemical, physiological, and morphological characteristics as defined by genome and environment
three examples of ncRNAs
snoRNA, miRNA, snRNA, piRNA
where does transcription start? what number?
Beginning of exon 1. Could be “negative whatever”
where does translation start? what number?
usually within exon 1, but not always. +1
where is the poly A tail signal?
end of last exon, after 3’ UTR
histone modification
certain aa in histone tails can be acytylated or methylated. Alters charge of histones, and therefore configuration, and therefore openness of chromatin
example of histone modification disease
Kabuki sydrome
epigenetics
enduring changes in gene expression that do not involve sequence modifications
what DNA bases are methylated?
C’s preceding G’s
DNA methylation usually represses transcription. How?
Directly- some proteins can’t bind. Indirectly- contributes to tighter conformation of chromatin
example of DNA methylation disease?
Lynch syndrome, many cancers
promoters
usually upstream of 5’ UTR. Highly heterogenous, though some common motifs (TATA box) have been noted.
enhancers and silencers
DNA sequence elements that can act at a distance from a gene to regulate transcription. Can be up or downstream
TADs
topologically associated domains. discrete chromosome regions that interact with each other more often
transcription initiation
requires promotor and transcription factors
basal transcription aparatus
guides RNA polymerase to transcription start site. includes RNA polymerase and transcription factors
transcription factors
sequence-specific DNA binding proteins that bind close to promoter
does RNA polymerase need a primer?
No
how is RNA pol released from DNA?
exonuclease begins removing bases in 5’-3’ direction until it catches up with RNA pol
5’ capping
methylated nucleoside added to 5’ end by phosphodiester bond
how is 3’ end of mRNA determined?
AAUAAA or variant in 3’ UTR signals cleavage about 15-30 bases downstream from itself. After cleavage, poly A tail added.
start and end of introns
GT(U)….AG
branch site
conserved intronic sequence. Has an invariable A. provides first nucleophilic attack for splicing mechanism. Also a binding site for elements of spliceosome?
RNA editing
post translational (did I mean transcriptional?) base changes. ex, U to C, C to U, A to I
microRNA
ss RNA, ~20 b. in cytoplasm. Guides RISC. Binds to 3’ UTR and down regulates (degradation if perfect match, repressing translation if imperfect match)
nonsense mediated decay pathway
identifies premature stop codons. depends on proteins bound to splice areas, so does not work on intronless genes
mtDNA transcription
bidirectional? Unlike replication? Large multigenic transcript
structure of amino acid
amino group (+charge), carboxylic acid group (-charge), side chain.
bonds between amino acids
peptide bonds. condensation reaction between carboxyl group and amino group
does mitochondria need tRNA from nucleus?
No, makes all its own tRNA
ribosome composition
~80 proteins and 4 RNA molecules
ribosome binding sites
Positions itself until it finds AUG, then AUG tRNA binds P site, and tRNA for second codon binds A site. Bond created, second tRNA slides to P site, third tRNA enters A site
translation termination
when termination codon encountered, protein release factor enters A site
asymmetric exon
non divisible by 3
four levels of protein organization
sequence, initial folding, overall three dimensional shape, interaction with other proteins
protein folding stabilization
achieved by covalently and non-covalently bonded entities. Chaperones also help stabilize and fold.
protein degradation process
ubiquitinated by ubiquitin ligase. Proteosomes then degrade protein.
genomic imprinting
physiologic form of gene regulation that causes a subset of genes to be expressed from only 1 of the 2 parental chromosomes
mechanism of genomic imprinting
differential methylation of imprinting control centers
mechanism of x inactivation
Inactivation initiated at X inactivation center, Xic at Xq13. Xic encodes a long non coding RNA, XIST. XIST binds to inactive X and recruits proteins to organize chromatin into inactive state
what genes escape X inactivation?
Two pseudoautosomal regions plus some others
how does an X:autosome translocation affect x inactivation
chromosome with the Xic becomes inactivated. This can spread into the autosomal region. The x segment on the autosomal chromosome does not become inactivated
skewing of x inactivation
when one X is abnormal, it is preferentially inactivated. When there is an X:autosome translocation, the normal X is preferentially inactivated.
single strand repairs
base excision repair, single strand break repair, nucleotide excision repair, base mismatch repair, direct reversal of damage
double strand repairs
homologous recombination mediated repair, nonhomologous end joining
variant
any sequence change as compared to reference
polymorphism
DNA variant that is prevelant at >1%
copy number variants
200 bp - 2 Mb. recent discovery. common.
SNP
1/300 nucleotides is polymorphic. numerically, most abundant type of genetic variant. usually biallelic
origin of SNPs
ancestral chromosome segments (rather than recurrent mutation)
tandem repeat polymorphisms source
relatively recent. 1) minisatellite diversity from mispairing in meiosis. 2) microsatellite diversity from polymerase slippage during replication
what’s better for distinguishing between individuals- SNPs or tandem repeat polymorphisms?
tandem repeats- there are more alleles
sources of genetic variation
errors of replication, errors of recombination during repair, meiosis, and mitosis, and DNA damage
why are CpG sequences mutational hotspots?
If C is methylated then deaminated, it becomes T, which is not always recognized for repair
DNA damage forms
deamination, depurination, ROS, aberrant DNA methylation, radiation
mismatch repair (MMR)
checks newly synthesized DNA for mismatched base pairs or small indels
microsatellite instability could indicated that what repair pathway isn’t functioning?
MMR (ex. Lynch syndrome)
base excision repair
main repair mechanism for most common DNA damage. Glycosylases cleave sugar-base bond. Endonuclease cuts and removes sugar and phosphate. DNA polymerase and ligase fills and seals gap.
nucleotide excision repair
recognizes distortion of helix, removes and resynthesizes 25-30 bases with pol and ligase
homologous recombination as repair for ds breaks
Occurs in S phase, sister chromatid used as template. Break resected to leave overhangs, strand invasion, DNA synthesis
nonhomologous end joining
ligase joins ds breaks without template
missense mutation
new amino acid
nonsense mutation
new stop codon
synonymous mutation
same amino acid
dynamic mutations
copy number of microsatellites expands substantially between generations
transition substitution
purine to purine or pyrimidine to pyrimidine
transversion substitution
purine to pyrimidine or vice versa
trinucleotide repeats are found where in genome?
anywhere. intron, exon, UTR
frameshift nomenclature
listing the first amino acid change and the number of amino acids before the stop codon
allele nomenclature
”[ ]”, separated by “;”
nomenclature for introns
number of the last nucleotide of the preceding exon, a plus sign and the position in the intron, like c.77+1G. number of the first nucleotide of the following exon, a minus sign and the position upstream in the intron, like c.78-1G
conservative amino acid substitution
replacing with a similar aa
requirements for sanger sequencing
DNA pol, primers, dNTPs, ddNTPs
sanger sequence read length
up to 1000
major limitation of sanger
medium to large size deletions can be missed in heterozygous individuals
library preparation for NGS
fragment DNA, add adaptors, denature
reversible terminator sequencing
add fluorescently labeled terminator nucleotides (no 3’ OH). Take pic after addition. Restore bonding ability. repeat process. Number of cycles determines read length
pyrosequencing
enzyme reaction gives light when PP naturally released during nucleotide addition. Add different dNTPs sequentially. Flash of light for base addition. No light means no addition. Multiple of bases will have more light.
mapping
The computational process of identifying the specific region of a reference genome from which an individual sequenced DNA template originated
mappable reads
Very short DNA sequences that can be determined to
originate from a single location in the genome
mappable yield
The number of bases generated by a DNA-sequencing
instrument that can be mapped to the reference genome
average depth of coverage
The average number of times each base in the genome was sequenced, as a function of the distribution and number of sequence reads that map to the reference genome. Make sure to differentiate between raw and post-alignment coverage.
benefits of NGS
fast, high throughput, better ability to detect indels, better ability to detect mosaicism
main limitation of NGS
short read lengths (~100 bases).
can NGS sequencing detect Robertsonian translocations?
No
What mechanism leads to minisatellite diversity (-> tandem repeat polymorphism, “Variable Number Tandem Repeats, VNTRs)?
mispairing in meiosis
What mechanism leads to microsatellite diversity (-> tandem repeat polymorphism, “Short Tandem Repeat Polymorphisms, STRPs)?
polymerase slippage during replication
Nucleolus Organizer Region
the NORs are located on the short arms of the acrocentric chromosomes 13, 14, 15, 21 and 22, the genes RNR1, RNR2, RNR3, RNR4, and RNR5 respectively.[1] These regions code for 5.8S, 18S, and 28S ribosomal RNA.[1] The NORs are “sandwiched” between the repetitive, heterochromatic DNA sequences of the centromeres and telomeres.
Nucleolus
largest structure in the nucleus of eukaryotic cells.[1] It is best known as the site of ribosome biogenesis