Lecture1 Flashcards
How much DNA sequencing got cheaper since finishing the Human Genome in 2003
> 150,000‐fold
What is estimated in 2017 for Nova-seq?
2017 Nova‐seq: up to
3 trillion (3x1012) bp
in 40h
How many patients in the Human atlas?
11,000
What does Human cell atlas does?
International Consortium to
characterize „all“ human
cells with genomic methods
what is Genome project write (GP‐write)
an initiative to rewrite the
human genome
what is the major change in
biology in the last decade
Genomics and sequencing is maybe
Types of comparisons between healty and dieases genomes?
1)Genetic mapping in families 2)Genome-wide association studies (GWAS) 3)Cancer Sequencing
what is Genome annotation?
where are the functional sites/elements, i.e. where are The genes and their regulatory elements, when/where are they expressed, how do they interact on cellular, tissue, organism level
What fraction of the human genome are protein‐coding genes?
20,448
What fraction of the human genome are non protein‐coding genes?
23,997
What is the averages size of CDS?
1500 bp
What is the averages size of 5’UTR?
170 bp
What is the averages size of 3’UTR?
700 bp
the averages size of 3’UTR is larger than 5’UTR?
True
What is typcil Human Pro-coding mRNA?
5’Cap, 5’UTR, CDS, 3’UTR, polyA tail
How many precent of the human genome codes for proteins?
1%
How many precent of the human genome codes for protein‐coding mRNAs?
1.5% (include with 3’UTR and 5’UTR + CDS)
ordinate from larger to smaller? CDS,5UTR and 3’UTR
CDS,3’UTR, 5’UTR
only 1 % of the human genome codes for protein?
True
Only 5% of genome is regulatory DNA?
True
Only 2% of genome is coding region?
False 1%
Only 1% of genome is UTR?
False 0.5% UTR
Only <1% of the genome is noncoding
RNA
True
How many percent of HG are consists of TE?
45% of the human genome can be readily classified as transposable element More sophisticated search finds 2/3 of the genome to be derived from TEs
Is The human genome consists mostly of TEs?
True
How many percent of HG is consists of Alu?
10.6%
How many percent of HG is consist of Dna transposons?
2.8%
What is the other name of SDs? explain
Low copy repeats (LCRs) = segmental duplications (SDs) are larger
regions (e.g. >1kbp) that are e.g. >90% identical and occur at least
twice
Explain about Highly repetetive DNAs?
shorter DNA pieces that make up >50% of the
human genome
What is Repeatmasker?
is a program that can serve as a practical
definition
What are Classes of highly repetitive DNA?
– High‐copy number tandem repeats (Satellite DNA)
– Transposable elements (or “interspersed repeats”)
• DNA transposons
• Retrotransposons
Duplications of DNA sequences occur by different mechanisms and
have a huge impact on what?
shaping a genome
Where are the Tandem repeats?
at centromers and telomers
Segmental duplications is basically?
Occurrence of duplicated genomic regions-
Define SDs? their lenght? frequency? if they are just TE?
Segmental duplications (also termed “low-copy repeats”) are blocks of DNA that range from
1 to >400 kb in length, occur at more than one site within the genome, and share a high level of (>90%)
sequence identity
(note: transposable elements are filtered out, i.e. SDs are not “just” TEs)
How many percent of the human genome falls in SDs?
5%
Are many genes are part of SDs?
Yes, Enriched in regions without genes, but also many genes are part of SDs, its also about 10 fold enrichment close to telomeres and centromers
Say if its true.
a)SDs are hotspots for disease mutations (that are difficult to detect with short read
technologies)
b) SDs are hotspots for very complex genomic region
all true
More SDs generate more SDs?
Yes!!
We have Different types of tandem repeats;what is the most abundant one?
alpha satellites at centromers
What fraction of the genome is covered by Segmental duplications?
5% of the human genome falls in SDs
Why are Segmental duplications hotspots for disease mutations?
that are difficult to detect with short read
technologies
is Segmental duplications matter for genome assemblies?
Yes
in which chromosomal regions are SDs enriched
10‐fold Enrichment close to telomeres and centromers, Enriched in regions without genes, but also many genes are part of SDs
What are the other name of TEs?
„jumping genes“ or „selfish genetic elements“
What are TEs?
Transposable Elements are pieces of DNA that can replicate independently within a genome
TEs occur in all species
from bacteria to humans
(just like viruses).
True?
Yes
By who and in whereTEs Originally discovered?
in maize
by Barbara McClintock
in the 40‘s (Nobel price 83)
Are TEs important?
Yes, Transposons are important tools
(Transposon mutagenesis;
Nextera for Illumina)
What are selfish DNA good for?
TEs can be seen as Selfish DNA or genomic parasites
“The spread of selfish DNA sequences within the genome can be compared to the
spread of a not‐too‐harmful parasite within its host”. Asking what are they good for is like asking what viruses are good for
How many types of jumoing do TEs have?
Two types of jumping 1)Via RNA intermediate: Retrotransposon (reverse trancription will make dna intermediate then integration) 2)Via DNA intermediate: DNA transposon (similar to RNA and DNA virus)
What do Autonomous element encodes?
Autonomous element encodes
necessary regulatory DNA and
encodes protein(s) needed for
transposition
What do non-Autonomous element encodes?
Non-autonomous element encodes
only necessary regulatory DNA
and uses proteins from autonomous
element
What are the Direct consequence of A new insertion of a TE?
neutral (most likely) => TE insertion will acquire random mutation and eventually
(>300 mio years) mutate beyond recognition
• bad => host and TE insertion die sooner or later
• Advantagous (rare) => TE insertion will be maintained
Compare to gene duplication with TE: most often neutral, sometimes deleterious,
rarely adaptive
True
What is Long-term consequence for the host for TEs?
Hosts need to defend themselves against TEs
• Many defense mechanisms of the hosts exist (piwiRNAs. ZnFingers etc.)
• Just like for the immune system there is a constant arms race of TEs and the
host defenc
• This arms race has strongly shaped genomes (origin of epigenetic
mechanisms, RNA regulation)
The current content of TEs in a genome is the result of all past TE pandemics and
the following mutations and selection events
T or F?
True
What are the only transposable elements still active in the human genome?
Line1 (L1) and Alu are the only transposable elements still active in the human genome
(~1 TE event every 20 generations)
Are The two most common TEs in the human genome are
Retrotransposons?
Yes, Line1(autonomous) and Alu (Non-autonomous, needs L1 gene products)
1) RNA transcription from internal promotor
L1:
Alu:
For each say which of RNA pol?
RNA transcription from internal promotor
L1:RNA Pol II
Alu: RNA Pol III
Translation of reverse transcriptase
L1:
Alu:
For each say which?
L1: ORF1 and ORF2
Alu: no ORF!
Reverse transcription and integration
Often truncated if not ?
Reverse transcription and integration
Often truncated if not the full mRNA is reverse
transcribed => new copies are often inactive
a) L1 elements can be seen as..
b) Alu elements can be seen as…
a) L1 elements can be seen as genomic parasites
b) Alu elements can be seen as parasites of L1 elements
Transposition does often lead to full length copies.
True or False?
false, Transposition does often not lead to full length copies
What is Alus
evolved
from 7SL
RNA?
7SL RNA is a RNA Pol III transcribed RNA that is part of the signal recognition particle, which is involved in
translocating peptides during translation into the ER
Tell me more about Alu families?
One distinguishes different families of Alus similar to different subtypes of a virus Alu Yc1 is still a little active in human genome (~1 event /genome every 20 generations)
Are Alu Yc1 still a little active in
human genome?
Yes, Alu Yc1 is still a little active in
human genome (~1 event
/genome every 20 generations)
How is Emergence, spread and extinction of TEs?
Its just like a viral epidemic, but with the difference that the products partly remain (and of
course continue to mutate) in the host genome, i.e. The current genome is a sum of past
epidemics + the mutations that occured since the epidemic
another way to
look the history of TE pandemics that hit the germline of the genomes anacestors?
Age (=% substitution from a common ancestor sequence or consensus sequence) of TEs
The more
substitutions the
older the element !
True or F?
True!!
certain Alu families more frequent than others in the human genome
T or F?
True
the average pairwise sequence difference between TEs not homogenously distributed
T or F?
True
What element of rodents is also evolved from 7SL
RNA?
B1 element of rodents
also evolved from 7SL
RNA
Different species have (partly) Same histories of TEs?
False, Different species have (partly) different histories of TEs
SINE elements (non-autonomous
…..) that evolved from a …
gene
SINE elements (non-autonomous
retrotransposons) that evolved from a tRNA
gene
Maize has more TE than Arabidopsis
yes, Maize also has more genome size (Gap)
Smallest and largest
vertebrate genome
a) Tetraodon fluviatilis pufferfish
b) Protopterus aethiopicus Lung fish
a) 350 Mbp
b) 132 Gbp
What is C‐value?
haploid nucleus content has most often been measured by weighting the DNA content of nuclei.
Haploid nuclear DNA content
in million bp from ~ 10,000 species (star = human)
Genome size correlates with TE content.
T or F?
True
Genome size, complexity and the
C-value paradox
• It has been known for a long time that Genome size varies widely among
species with similar complexity
• This has been called C‐value paradox
• TEs can readily explain this (=> it is not a paradox)
• Seems like a higher DNA content is not such a big disadvantage for slower
growing (multicellular) organisms
• => balance between generating DNA and rate of DNA loss mainly
determines genome size
• Complexity of multicellular organisms is not correlated with genome size
nor gene counts!
• A high proportion of genomes is junk (=non‐functional DNA)
Tell me about Junk DNA?
Susumo Ohno liked explicit statements
and called this DNA junk DNA in 1972
General term for non-functional DNA that
raised and still raises a lot of emotions
WHy its hard to analysis SDS?
Because they look similar
There is a relationship between Complexity and Genome size?
No
Onion genome size is bigger bc
TE
SDS cannot be polymorphics?
False
SDs are highly copied numbers (classes of repeats)
False, Classes of repeats?
High copy tandem ((satellites, minisatellites, microsatellites)
– Low copy (=~ segmental duplications)
– Transposable elements (Tes)