Genes and Genomes Flashcards
Definition
a method of DNA sequencing first commercialized by Applied Biosystems, based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication
Sanger sequencing
Definition
the material of which the chromosomes of organisms other than bacteria (i.e. eukaryotes) are composed, consisting of protein, RNA, and DNA
Chromatin
Define
Methyl-cytosine
the normal cytosine nucleotide in DNA that has been modified by the addition of a methyl group to its 5th carbon
Definition
non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates
SINES
What is considered the fifth base in DNA?
Methyl-cytosine
Definition
a unit made up of linked genes which is thought to regulate other genes responsible for protein synthesis
Operon
Mobile genetic elements are not usually found in gene exons/introns. Examples are retrotransposons which move via a DNA/RNA intermediate
Mobile genetic elements are not usually found in gene exons. Examples are retrotransposons which move via a RNA intermediate
Where are CpG islands usually found?
Mainly at the 5’ end of genes
How many bases does the human genome contain?
3162 million bases
Whole genome shotgun (WGS)
entails sequencing many overlapping DNA fragments in parallel and then using a computer to assemble the small fragments into larger contigs and, eventually, chromosomes
Definition
a functional RNA molecule that is transcribed from DNA but not translated into proteins
non-coding RNA/ncRNA
What is the Whole Genome Shotgun Method?
Genomic DNA is shred randomly before being read. Repeated many time to ensure at least 30x read depth coverage. The reads are then reassembled into the genome sequence
Definition
an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences
BLAST search
What makes up junk DNA?
Pseudogenes
Mobile genetic elements (i.e. LINES, SINES, incomlplete retroviral-like elements and Transposon remnants)
Definition
Describing a type of messenger RNA that can encode more than one polypeptide separately within the same RNA molecule
Multicistronic
What is used to sort out the contigs given in de novo assembly?
PacBio
What is a hypothetical protein?
A predicted protein that is not similar to any characterised protein
What BLAST program is used for a protein query search in the protein database?
BLASTp
What are the major characteristics of SINES?
They do not encode reverse transcriptase, endonuclease or integrase
Definition
a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes
FASTA
Define
non-coding RNA/ncRNA
a functional RNA molecule that is transcribed from DNA but not translated into proteins
Define
Draft genome sequence
Sequence of genomic DNA having lower accuracy than finished sequence; some segments are missing or in the wrong order or orientation
Definition
Elements that are transcribed into RNA, reverse-transcribed into DNA and then inserted into a new site in the genome
Retroviral-like elements
Definition
a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores
FASTQ
Define
Genome annotation
the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do
Define
Retroviral-like elements
Elements that are transcribed into RNA, reverse-transcribed into DNA and then inserted into a new site in the genome
Define
Paralogue
Either of a pair of genes that derives from the same ancestral gene
Define
SINES
non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates
The ENCODE project is an editing/annotation approahc that has built a map of functional elements within the human genome, suggesting that over 50%/70% is biologically active
The ENCODE project is an annotation approahc that has built a map of functional elements within the human genome, suggesting that over 70% is biologically active
What is the genome data problem?
The ever increasing analysis gap that is occurring because our ability to analyse is not keeping up with the data available
Definition
a project that seeks to interpret the sequence of DNA that makes up the human genome
ENCODE project
What were the strategies used by HGP and Celera to sequence the human genome?
HGP used an ordered or directed strategy
Celera used a shotgun strategy
Define
Pseudogenes
a section of a chromosome that is an imperfect copy of a functional gene
What were the key findings of the ENCODE project?
Around 80% of the human genome is assocaited with at least one biochemical event
___________ arise by gene duplication followed by gene inactivation - contain introns
____________ are formed by integration of DNA copies of mRNA - do not contain introns
Classical pseudogenes arise by gene duplication followed by gene inactivation - contain introns
Processed pseudogenes are formed by integration of DNA copies of mRNA - do not contain introns
Definition
DNA that does not code for a protein, usually occurs in repetitive sequences of nucleotides, and does not seem to serve any useful purpose
Junk DNA
True or False:
The HGP sequence tells us nothing about the genetic variation between individuals
True
What BLAST program is used for a nucleotide quesry searchin the protein database?
BLASTx
Definition
Either of a pair of genes that derives from the same ancestral gene
Paralogue
Why does the sequence CpG occur at a lower than expected frequency in vertebrates?
During DNA damage, deamination of unmethylated C gives rise to U, which is recognised as a fault by DNA repair machinery. Deamination of methylated C gives rise to T, which is not recognised as an error by DNA repair machinery. Over evolutionary time, methylated Cs have been mutated to T, so CpG is under-represented in vertebrate DNA
Define
CpG island
stretches of DNA 500–1500 bp long with a CG: GC ratio of more than 0.6, and they are normally found at promoters and contain the 5′ end of the transcript
How do SINES move?
Using enzymes produced by other mobile elements e.g. LINES
Definition
a set of overlapping DNA segments that together represent a consensus region of DNA
Contig
Zero
What is an unbroken consensus sequence called?
Contig
True or False:
The sequence data found in the HGP is inaccessible by regular people
False
It is publically available
Definition
entails sequencing many overlapping DNA fragments in parallel and then using a computer to assemble the small fragments into larger contigs and, eventually, chromosomes
Whole genome shotgun (WGS)