Bioinformatics Flashcards
When was the structure of DNA determined?
1953
Since the determination of the structure of DNA in 1953 and the realisation that this molecule is the carrier of genetic information, it became a scientific priority to …
…determine the precise sequence of nucleotides within chromosomes and find out the relationship between this sequence and the workings of the cell.
In 1977 Fred Sanger published…
… the first “rapid” DNA sequencing method
In 1977 Fred Sanger published the first “rapid” DNA sequencing method
The same year, he published the first …
…complete DNA genome, which was of the Phi X 174 (ΦX174) bacteriophage
In 1995 the first complete genome was sequenced and published of the …
…bacterium Haemophilus influenzae.
Describe the genome of the bacterium Haemophilus influenzae.
Circular DNA genome consisting of
1,830,140 base pairs
How many protein encoding genes does Haemophilus influenzae encode?
Encodes 1740 protein encoding genes
In 1997, the first complete eukaryotic genome was sequence and published of …
…the yeast Saccharomyces cerevisiae
How is DNA organised in the yeast Saccharomyces cerevisiae?
DNA organised on 16 chromosomes consisting of:
12,156,677 base pairs
How many potential genes does the yeast Saccharomyces cerevisiae encode?
6275
In 2003 the complete human genome was …
… sequenced and published.
Since the completion of the human genome there has been an explosion in the amount of …
…DNA sequence data available due to advances in DNA sequencing techniques
Since 1995 the number of DNA sequences deposited into DNA databases has been …
…growing exponentially
February 2021 GenBank sequence database contains:-
776,291,211,106 bases
in
226,241,476 sequence records
there are 3 principal comprehensive databases of nucleic acid sequences in the World which are:
1) EMBL – European Molecular Biology Laboratory
2) GenBank – National Centre for Biotechnology
3) DDBJ – DNA Data Bank of Japan
The 3 principal comprehensive databases share…
…information
The 3 principal comprehensive databases share information and hence…
…contain almost identical sequences, and store sequence information that is publicly and freely accessible
Define bioinformatics?
the use of computational methods to study biological data
What is the first definition of bioinformatics?
1) The development of computational methods for studying the structure, function and evolution of genes, proteins and whole genomes.
What is the second definition of bioinformatics?
2) The development of methods for the management and analysis of biological information arising from genomics and high-throughput experiments
What is genomics?
the study of whole sets of genes rather than a single gene
What are high-throughput experiments?
development of experimental techniques that allow the study of thousands of genes simultaneously e.g. microarray technology/proteomics
by understanding the process of mutation and selection that act on the DNA sequences, molecular biologists can …
…compare the DNA and protein sequences of common genes between different species to develop molecular phylogenetic trees
in fact, evolutionary ideas underlie…
…many of the methods used in bioinformatics
in fact, evolutionary ideas underlie many of the methods used in bioinformatics -we use them to …
…compare sequences, identify families of genes and proteins and establish homology between genes in different organisms.
What is meant by degenerate?
several codons may code for a single amino acid meaning that a nucleotide change may not result in a change of amino acid
Since the genetic code is degenerate, sometimes it is more informative to …
…examine the amino acid sequence of the protein gene product.
Computer software can convert …
… a dna sequence into an amino acid sequence.
How many possible reading frames are there?
3 possible reading frames
Three possible reading frames x two strands of DNA = ?
Six possible translations.
As well as reading the code, the software looks for …
… start signals (AUG) and stop signals (UGA, UAG, UAA), to find the open reading frames (ORFs).
one of the most fundamental and frequent bioinformatic analyses are …
…sequence alignments
How do sequence alignments work?
here we take two (or more) DNA/protein sequences and compare them using a scoring system to determine the degree of homology (identity and similarity).
What is the first step of sequence alignments?
the first step is to compare sequences to find out how alike they are.
we are often interested in parts of the sequence which are …
… well conserved for a particular type of protein - these are called motifs.
We are often interested in parts of the sequence which are well conserved for a particular type of protein - these are called motifs.
For example …
… members of the thioredoxin protein family have a C-X-X-C motif ( Cysteine – any aa – any aa – Cysteine ).
within the above sequence, there is some…
…homology between the two amino acid sequences
within the above sequence, there is some homology between the two amino acid sequences - the test sequence contains a …
…C-X-X-C motif and shares some identity (exact matches) between itself and the hPDI sequence.
What is identity?
Exact matches between two sequences.
within the above sequence, there is some homology between the two amino acid sequences - the test sequence contains a C-X-X-C motif and shares some identity (exact matches) between itself and the hPDI sequence
10 (red letters) out of the 20 amino acids match giving a score of …
…50 % identity
Amino acids differ in their …
… R groups
Amino acids differ in their R- groups, which can be classified as …
…hydrophobic, polar, positively charged or negatively charged
Amino acids differ in their R- groups, which can be classified as hydrophobic, polar, positively charged or negatively charged. Therefore, some amino acid changes are …
…more severe than others
What happens if we change one amino acid for another with similar properties?
may not affect protein function.
amino acids differ in their R- groups and changing one amino acid for another may not be so detrimental to the protein if the amino acid is …
… similar in chemical/physical character
in this case, the green coloured amino acids are…
…similar in chemical character
in this case, the green coloured amino acids are similar in chemical character.
we include these as …
…‘positives’ along with the identical (red) matches
so 13 (red and green letters) out of the 20 amino acids within the sequence match, giving a ‘positives’ or ‘similarity’ score of …
…65%
What does a positives or similarity score of 65% mean?
these high values imply that the two proteins belong to the same protein family - are likely to share some functionality and that they were derived from the same evolutionary ancestor
we may want to compare DNA/amino acid sequences from many different species to…
…see how homologous they are
we may want to compare DNA/amino acid sequences from many different species to see how homologous they are
we can do this using a web-based program called …
…ClustalW2 at the EBI
ClustalW2 at the EBI performs…
…multiple sequence alignments
once the alignment is performed you can use…
…various tools within the program to highlight areas of percent identity/similarity
Scientists have found patterns in amino acid sequences which recur in …
…proteins with the same function.
Small sequences of conserved amino acids are called …
…motif
Define motif
Small sequences of conserved amino acids.
Purpose of C-X-X-C motif?
This motif is often used in proteins to take part in oxidation and reduction reactions (redox). Therefore, we can deduce that the test gene which we have just identified may have a role in redox reactions.
we can use programs to compare a DNA/protein sequence to find …
…others that are similar within and between different species
What does BLAST stand for?
Basic Local Alignment Search Tool
What is BLAST (Basic Local Alignment Search Tool)?
A statistically driven searching and alignment tool that searches ALL available sequence databases for similarity to the input sequence
Function of Translate?
this program translates a DNA sequence into the corresponding amino acid sequence in all six possible open reading frames
Function of ProtParam ?
analyses the primary amino acid sequence to give useful data such as the size of the protein, its isoelectric point, its extinction coefficient, the number of hydrophobic/hydrophillic residues and how stable it may be
Function of Psort
using the amino acid sequence, this program looks for known signals within the sequence and predicts where your protein will end up in the cell – i.e. nucleus, ER, plasma membrane, secreted etc
Function of Tmpred or TMHMM ?
searches for regions of hydrophobic amino acids to predict if the resulting protein is likely to be integral within a membrane
Function of PSIpred or PredictProtein?
Predicts what regions of the primary sequence fold into secondary structures (a-helices and b-sheets) or even tertiary/quaternary structures.
Function of Protein Data Bank (PDB)?
stores data about the structure of proteins that come from either X-ray crystallography or NMR experiments
The protein data bank stores…
…the coordinates of every atom within a protein and allows you to build 3D models of proteins that you can examine using a Jmol viewer
structural data is very important in trying to understand…
…how proteins work as ‘machines’ at the molecular level especially when considering inhibitors or mutations that may alter the structure and therefore activity
we can use the structures of known proteins to build …
…structural models for new amino acid sequences to get an idea of what the eventual protein could look like and thus what function it may perform
despite becoming more sophisticated and reliable, many of these bioinformatic programs are based on …
…statistical packages and can only PREDICT the structure/function/localisation of a protein - laboratory experiments are STILL required to confirm the predictions made by the programs
The activities of the cell are determined by …
…when the genes are expressed or stop being expressed.
DNA microarray technology enables us to …
…examine gene expression in different circumstances.
How does DNA microarray technology work?
Oligonucleotides specific to each gene from the genome are fixed onto a ‘chip’ and then probed with free fluorescent oligonucleotides derived from mRNA of a control and test cell.
In microarrays, what does green represent?
gene expressed only under test conditions
In microarrays, what does red represent?
gene expressed only under control conditions
In microarrays, what does yellow represent?
genes expressed under control and test conditions
What is RNA-seq
RNA-Seq is a sequencing technique which uses next-generation sequencing to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome.
A number of databases provide…
…open access RNA-seq data for analysis.