World of Genomics Flashcards
What are the two types of grooves in dna?
Major grooves and minor grooves.
What bonds are between the bases holding the two dna strands together?
hydrogen bonds
What bonds are between bases on a single strand of DNA?
strong covalent bonds.
This means the dna strands can be melted and annealed back together. This is done in PCR.
What is DNA?
DNA is a polymer of nucleotides consisting of adenine (A), cytosine (C), guanidine (G) and thymine (T)
DNA molecules are…
…double-stranded and the two strands ‘pair’ together with complementary sequences forming the pairs
How can we sequence DNA?
using dideoxynucleotides we can sequence DNA using the chain termination method
the use of robotics has automated this so that millions of base pairs can be analysed per day
What has sequencing dna led to?
we have access to large amounts of DNA sequence – primary data
how does information flow in the cell?
information flows from DNA in the genome to RNA in the gene transcript to protein which is generally the functional end of information flow
What sources do we get primary data from?
We can get primary data then from three sources:
1 – the DNA sequence itself – sequencing genes - the genome
2 – the mRNA – the transcriptome
3 – proteins – via protein sequencing – the proteome
What is the Transcriptome?
the sum total of all transcripts in a cell/organism/tissue/organ.
What is the proteome?
the sum total of all proteins in a cell, organism, tissue, or organ.
DNA is transcripted to…
…DNA transcriped to mrna which is translated to rna and proteins.
Data from genome sequencing is converted into…
…text containing strings of
A, C, G and T
In genome sequences/sequencing, what is the main thing we are concerned with?
It is the order of the letters (A,C,G,T) that we are concerned about here
Where is data generated from genome sequencing projects deposited?
the data generated from genome sequencing projects is deposited in databases which can be ‘searched’ by users of the database
One of the main key points of modern genomics is…
…studying gene expression.
Eahc cells contain the same genes but…
…differ by which genes are turned on or off. This causes cell differentiation.
Knowing the sequence does not…
…tell us alot
why do we need to study gene expression?
- Studying gene expression allows us to understand what particular genes encode and what they do.
- If you can understand what cells have the genes turned on or off, you can deduce what the gene does.
- For example, if a gene is only turned on in brain tissue, it is likely to be a gene to do with the brain.
In cells, some genes will be…
…transcribed and others turned off
Some genes will have high expression, and some will…
…be expressed at low levels.
Describe differential gene expression.
- Some genes transcribed, some not transcribed. (turned on or turned off)
- Some genes having high levels of expression, some genes having low levels of expression.
It is differential gene expression which defines each cell in terms of:
1 – it’s particular function
2 – it’s developmental stage
3 – the response to environmental cues
4 – it’s genotype (wildtype or mutant)
What is qualtiative?
On or off
Describe how development state/stage changed expression.
During develpotment of red blood cells, when immature, particular genes are on, when mature, particular genes are off. Development state changes expression.
Whats an example of how we can study gene expression?
studying gene expression using northern blotting
WHat is northern blotting?
When people invented a way to detect rna, they called it a northern blot. This is used to determine RNA levels/molecules of a gene.
What is a southern blot?
first person to develop a way to detect dna molecules by radial dna was guy called ed southern. This plot was called a southern blot
Whats a western plot?
When people invented a way to detect proteins, they called it a western blot.
Why would we want to study gene expression?
imagine we have a gene that we are interested in and we want to know which tissues it is expressed in (i.e. we want to know about its regulation)
one way that we can do this is to use a technique which examines the quantity of mRNA molecules from that gene in different tissues
Describe studying gene expression.
we can isolate RNA molecules
we can separate them with gel electrophoresis
we can then identify them on a gel
How does gel electrophoresis work?
In gel electrophersis, dna separate out according to molecular weight when an electrical current is run through it, because the negative charge of the dna is attracted to the positive charge of the current.
Why do we use northern blots instead of gel electrophoresis when studying gene expression?
Gel electrophoresis does not tell us a lot on its own since we cannot distinguish the different dna. This is where northern blots are used.
Describe northern blots.
RNA is transferred to a nylon or nitrocellulose membrane
specific RNA molecules can be detected by hybridisation as long as you have a DNA probe that is detectable
it is detectable if labelled with radioactive nucleotides or labelled with a molecule that can be detected by antibodies
what is the purpose of the β-tubulin lane in northern blots?
works on cytoskeleton microtubules.
When is a northern blot used?
if we just want to look at individual genes
When can northern blots only reasonably be done?
if you have a hypothesis which suggests that these genes may be up-regulated – otherwise which genes do you test your northern blot with?
When might we want to look at the whole transcriptome / more than just one gene?
when we want to look at the response of all genes to a certain condition
Give an example of when we want to look at multiple genes ?
this is a good example – there are mutant mice that are obese – this stems from a mutation in a particular gene
in order to see what effect this gene has on the mouse it is necessary to see what other genes are turned up or turned down – how could we do this?
if we were interested in a given tissue type and wanted to know what were the genes expressed in that tissue does that mean we need to examine each gene individually?
we can examine all genes together using microarrays
What turns rna into dna?
Rna is turned into dna by reverse transcriptase.
What can microarrays help us understand?
Using this, we can understand what genes are turned on or off in certain diseases. This allows us to determine how diseases such as cancer comes about. Then we can look at developing new potential treatments.
What is an array in microarrays?
an array is basically a large number of ‘spots’ of a DNA probe that is covalently linked to a substrate such as a glass slide or a silicon chip
What are arrays constructed by?
the arrays are generally constructed by robots using either small (25bp) oligonucleotide probes or larger cDNA probes
How is data dealt with?
in order to deal with this amount of information we need to use computers and sophisticated software
this is where the science of bioinformatics comes in
we can use DNA sequence data to …
…construct relationships of organisms
we can use phylogenetic trees to …
…examine relationships of proteins to each other within the genome
we all know about the human genome project but…
…what is less known is that many other species now have their whole genomes sequenced
as sequencing of DNA gets cheaper …
…more genomes can be sequenced
the cost of the human genome project was …
…in terms of billions of dollars yet only 10 or so years later the price has dropped to a few thousand dollars per genome
What is Moore’s Law?
this states that processor speeds, or overall processing power for computers will double every two years
Why might a persons genome be sequenced?
A persons genome can be sequenced to see what drugs a patient will respond best to.
What does Gene Ontology give us?
it gives us a logical means of searching databases – all genes are assigned GO terms
the genes that are differentially expressed in a human breast cancer are arranged according to …
their biological function – in many cases these genes encode proteins common to signalling/enzymatic pathways
What are paralogues?
Genes that are members of a gene family
the different globins are …
…expressed at different times during development
Foetal globins have…
…higher oxygen affinity.
Why do genes duplicate?
in many cases duplication arises due to errors in either meiosis or DNA replication
In meiosis there can be a…
…mismatch during alignment at the metaphase plate and in DNA replication slippage may occur
In meiosis and dna replication what type of duplication occurs?
tandem duplication occurs where the copy sits on a chromosome next to the original gene
because of the presence of gene duplication in more complex genomes it gives space for …
…genes to evolve new functions – this in part has been cited as one of the reasons for increasing complexity over time
What happens as the size of prokaryotic genomes increase?
as the size of prokaryotic genomes increases then more of their genes tend to be members of gene families
in more complex eukaryotic genomes the percentage of genes in gene families is …
… even larger
Percentage of genes in gene families in more complex eukaryotic genomes is larger than…
…the percentage of genes in gene families in prokaryotes
through evolutionary history there is a tendency for organisms to …
…become more complex
Gene duplication may be important in…
…the increase in complexity
‘Genecards’ is a …
…simple entry point to databases
having the genome sequence in itself doesn’t …
…give too much information.
- the genome needs to be annotated
the genome sequence needs to be annotated:
1 – gene prediction – computer programs/BLAST/experimental
2 – gene annotation (structural and functional) - experimental
What are computer programs able to predict?
computer programs are able to predict the presence of genes from raw genome sequence data
What do computer programs use to predict the presence of genes from raw genome sequence data ?
- sequence that appears to encode a protein that is homologous to known proteins
- known intron/exon junction sites
- known promoter sequences/regulatory elements
Genome sequence is stored in…
…databases that are publicly accessible.
these can then be searched for potential genes or potential similarity to known genes
What is the usual way to search for homology?
the usual way for searching for homology is to use an algorithm called BLAST
What does BLAST stand for?
(Basic Local Alignment Search Tool)
How are results shown on BLAST?
using NCBI’s BLAST tool the results are shown graphically with colour coding for matches.
the results are also shown as an ‘E value’
What is the E value?
the Expect value (E) is a parameter that describes the number of hits you can “expect” to see by chance when searching a database of a particular size
How does E value decrease?
decreases exponentially as the Score (S) of the match increases
What does E value describe?
Essentially, the E value describes the random background noise.
an E value of 1 assigned to a hit can be …
…interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.
The lower the E-value, or the closer it is to zero…
…the more real the match.
What is the E value the likelihood of?
Likelihood of getting a match “as good as this match” based on random chance alone.
What are E values very dependant on?
E-values are very dependent on the query sequence length and the database size.
Short identical sequence may have a …
…high E-value and may be regarded as “false positive” hits.
Describe E-value < 10e-100
identical sequences - long alignments across the entire query and hit sequence
Describe 10e-50 < E-value < 10e-100
almost identical sequences - a long stretch of the query protein is matched to the database.
Describe 10e-10 < E-value < 10e-50
closely related sequences - could be a domain match or similar
Describe 1 < E-value < 10e-6
could be a true homologue but it is a grey area
Describe E-value > 1
proteins are most likely not related
Describe E-value > 10
hits are most likely junk unless the query sequence is very short