Bioinformatics. Flashcards
Define the DNA reading frame?
Each strand of DNA has 3 possible open reading frames where the strand is read in codons.
How many reading frames does a DNA molecule have?
6.
Define genome annotation?
The process of obtaining biological information from unprocessed, sequenced genetic data.
Define the MASCOT search engine?
A bioinformatic database.
What is bioinformatics?
The process of solving biological problems by utilising information stored on computer databases.
What is used in bioinformatics to increase biological understanding?
Biological databases.
Who coined the term bioinformatics in 1979?
Paulien Hogeweg.
How has bioinformatics helped the field of medicine?
By allowing us to compare biological processes in healthy and diseased bodies.
How is the information in bioinformatic databases used to help advance medicine?
The information has been collected from past patients and can be used to diagnose the same disease in others.
What kind of maps has the information from bioinformatic databases helped to create?
Genetic maps that show heritable traits.
How can bioinformatic databases help taxonomists classify species?
They can store the genome sequences of different organisms.
This allows for comparisons to be made between different organisms.
How has bioinformatics helped law enforcement companies?
Police forces use databases to store DNA profiles of convicted offenders making it easier to catch repeat offenders.
How can bioinformatics help molecular biologists conduct their experiments?
Primers designs are stored in databases. This allows scientists to easily build a primer.
How has bioinformatics helped pharmacologists?
They can use bioinformatics to design new drugs that are personalised for a persons genome.
How has bioinformatics helped farmers?
It has helped farmers develop new strains of crops which are disease or pest resistant.
From what biological sources will bioinformatics use data?
DNA.
RNA.
Protein.
How can bioinformatics help scientists sequence DNA strands?
By storing the information from past DNA experiments, this allows scientists to compare sequences.
How has bioinformatics helped with the study of proteins?
The storing of information related proteins allows other researchers to identify the same protein quickly.
What does the storage of information relating to proteins allow scientists to study about how proteins are changing?
It allows them to study evolution of proteins and also the mutations that can arise within their structure.
What are 5 ways that bioinformatics can help to study DNA?
Analysis of a DNA sequence.
The discovery of new genes.
The discovery of regulatory regions within the DNA strand.
The ability to annotate whole genomes.
To carry out comparative genomics.
The storage of DNA sequences in bioinformatic databses allows scientists to make what comparisons?
It allows scientists to compare genome sequences between different people.
What does the comparison of DNA sequences from different people allow for?
The detection of areas in the genome that code for genetic diseases such as sickle cell.
How does the study of different DNA strands help pharmacologists?
It allows them to develop new drugs that are likely to be absorbed and metabolised by the patient.
How does the storage of different genomes help us to study the physical genome?
It helps us find new genes or regulatory regions such as a TATA box or a binding domain for a regulatory protein.
What are 6 factors about RNA that bioinformatics can help study?
The different products of RNA splicing.
The expression of different RNA’s in different tissues.
The structure of different RNA’s.
The types of RNA that are produced by a single gene.
The RNA’s that are produced by thousands of genes at the same time.
The creation of specific DNA chips and microarrays for RNA analysis.
Why is it important to store the information that relates to the products of RNA splicing?
To know all of the products that can be produced by a single gene.
To know which RNAs are produced in response to an external stimulus.
To build genetic probes and microarrays.
What are 6 factors about proteins that bioinformatics can help with?
The identification of protein families.
The identification of various protein domains and regions.
The identification of various protein structures.
The identification of various protein functions.
How is bioinformatics used in the identifcation of proteins?
The results from electrophoresis and mass spectrometry are fed into a database for protein identification.
What are the 2 categories if bioinformatic databanks that store information relating to DNA, RNA and proteins
Databases that store information relating to nucleic acids.
Databases that store information relating to proteins.
What is GenBank?
The NIH genetic sequence database and is part of the International Nucleotide Sequence Database Collaboration (INSDC).
What 3 databanks help to make up the INSDC?
The DNA Data Bank of Japan (DDBJ).
The European Molecular Biology Laboratory (EMBL).
The GenBank at NCBI in the USA.
What does the INSDC allow researcher to do?
To identify a particular nucleotide sequence by searching through millions of different nucleotides.
In what format is genetic information at the NCBI stored?
In a format that displays information about the nucleic acid such as function etc.
What is Uni-prot?
A database that combines all of the information from major international databases.
What 5 major databnaks does Uni-prot obtain information from?
European Bioinformatics Institute (EBI).
Protein Information Resource (PIR).
Georgetown University Medical Centre (GUMC).
National Biomedical Research Foundation (NBRF).
The Swiss Institute of Bioinformatics (SIB).
What is ExPasy?
A tool from the SIB and it provides access to databases and software tools that cover all areas of life sciences.
What does ExPasy stand for?
The expert Protein Analysis System.
What is a query sequence?
A particular sequence that is chosen by the experimenter and instead into a BLAST search.
What can a query sequence be made up of?
Of amino acids or nucleotides.
When is a BLAST search performed?
When a researcher wants to discover more information that relates to their query sequence.
BLAST searches require query sequences to be of what length?
At least 15 nucleotides or amino acids.
How do scientists perform a BLAST search?
They insert the query sequence into a database.
This allows them to compare their sequence to all of the known sequences within the database.
What are the 2 formats that a query sequence can be entered into a BLAST search?
The FASTA forma.
The identifier format.
What does the FASTA format of a BLAST search consist of?
Only the nucleotide or amino acid sequence.
How does the identifier format differ from the FASTA format?
It contains the sequence with an accession number or gene ID information.
What is alignment?
The presentation of 2 sequences that can be compared to show the regions of similarity.
How is alignment performed?
When a researcher compares their query sequence with a known sequence.
What is the score value?
A term that is used to measure the quality of alignment between a query sequence and the search results.
What is the score in the score value usually based on?
On the number of nucleotide or amino acid matches between the query sequence and the search results.
Does a high score value mean good or bad alignment?
The higher the score the better the alignment.
What does the score value allow for when selceting matches between search results and the query sequence?
For the selection of the best match between the query sequence and the search results.
What is the E-value?
The expectation value.
It measures the amount of possible outcomes.
What does the expectation value highlight?
The significance of an alignment.
Many alignments mean that there are many possible outcomes.
A single alignment means there is only one possible outcome.
Does a low or high e-value indicate a good match between the query sequence and the search results?
The lower the E-Value the better the match.
What is a genome annotation?
The process of obtaining biological information from unprocessed sequence data.
What is the ultimate goal of genome annotation?
To create a labelled genome, where biological information is linked to a particular genetic sequence.
What will a genome annotated map tell us?
Exactly what each gene does.
What 2 categories can genome annotation be divided into?
Structural and functional annotations.
What do strucutral genome annotations attempt to identify?
Genomic elements such as the promoter sequence or the TATA box.
What 3 things do strucutural genome annotation involve?
Researching the structure of genes.
Researching the areas of the genome that code for certain products.
Reaserching the location of regulatory motifs such as the TATA box.
What does the results from strucutral genome annotations allow for?
The identification of the cis factors that are used in transcription and translation.
What do functional genome annotations allow for?
To identify the biological functions of genomic products such as proteins.
What is the basic tool of genome annotations?
The BLAST program which allows us to identify similarities between genes and proteins.
What is gene prediction software used for?
To identify the genes found in long DNA sequences that code for amino acids and have no STOP-codons.
What factor inherent to DNA is used by gene prediction software when identifying genes on an unknown strand?
The DNA reading frame which allows for the strand to be read in triplets or codons.
How many reading frames does each DNA strand have?
3.
What does the correct reading frame from a DNA strand tell you about the strand?
The knowledge of the amino acid products that are provided by the DNA strand.
How many reading frames must gene prediction software evaluate if it is analysing an entire DNA molecule?
6 possible reading frames for a DNA molecule as it consists of 2 strands.
What happens once the correct reading frame has been interpreted by gene prediction software?
We can identify the protein that is created by a gene by using the list of amino acids that are created.
What are 2 tools that can help identify the open reading frame?
The Open Reading Frame Finder at ORF FINDER or at GEN-scan.
What complicates gene prediction software?
The fact that most of a genome is made from non-coding DNA.
This means gene prediction software must identify coding DNA from non-coding DNA.
What are 10 common features found in coding DNA?
The open reading frame.
A start codon.
A stop codon.
A terminator sequence (prokaryotes).
A TATA box (eukaryotes).
A Shine Delgano sequence (prokaryotes).
Kozak sequence (eukaryotes)
A poly-A addition sequence (eukaryotes).
Intron and exon boundaries.
CPG islands.
What kind of genome is the ORF very good at analysing?
The bacterial genome.
Why is the ORF not good for analysing the eukaryotic genome?
Because they contain introns and exons that are spliced into and out of mRNA’s that code for proteins.
What makes a good ORF?
It should begin with a START-codon (a methionine residue) and end with an in frame STOP-codon.
How does the presence many STOP-codons that are located close together on an unknown strand affect the ORF?
It suggests that an ORF is not present.
Why are longer ORFs better than short ORFs?
As the longer the ORF, the less likely it is to occur by chance.
What does sequence alignment software allow for?
Ffor a newly sequenced gene to be analysed to see if it is already known and stored in a database.
What are the 2 most popular tools that are used for sequence alignment?
BLAST (Basic Local Alignment Search Tool).
FASTA (Fast All).
What is the most widely used program in bioinformatics?
The BLAST program.
What does the BLAST program do?
It searches through a database to find matching or similar sequences the one that is being tested.
How do results from the BLAST program appear?
As high scoring segment pairs (HSPs).
Where the score is the amount of matches between the query sequence and the database sequence.
What can we do after the matches have been produced by BLAST?
We can evaluate the matches.
What is the main idea behind the BLAST tool?
To find regions of similarity between the sample sequence and the known sequences from the database.
What are local similarities that have been detected by BLAST?
Where both sequences have a region of similarity that is based in a single location.
What are global similarities that have been detected by BLAST?
Where the 2 sequences have regions of similarity all over the sequence.
What kind of database is BLAST-P and what kind of query is used to search through the database?
A protein database.
A protein query.
What kind of database is BLAST-X and what kind of query is used to search through the database?
A protein database.
A translated nucleotide query.
What is compared via the use of BLAST-X?
This method compares the 6-frame translations of DNA to a protein database.
What kind of database is tblastn and what kind of query is used to search through the database?
A translated nucleotide database.
A protein query.
What kind of database is tblastx and what kind of query is used to search through the database?
A translated nucleotide database.
A translated nucleotide query.
What is compared via the use of tblastx?
This method compares the 6 frame translations of a DNA query to the 6 frame translations of a DNA database.
Each sequence of tblastx is comaprable to sequences from which other BLAST technique?
To BLAST-P sequences.
What kind of database is FASTA and what kind of query is used to search through the database?
DNA or protein database.
A DNA or protein query.
What kind of database is FAST-X and what kind of query is used to search through the database?
A protein database.
A translated DNA sequence.
What kind of database is TFASTA and what kind of query is used to search through the database?
A translated DNA database.
A protein query.
What marks the beginning of a query sequence that uses the FASTA format?
A single line description that is followed by the lines of sequence data.
How is the description line distinguished from the sequence data line in a query sequence in FASTA format?
Becuase the description line has a greater than (“>”) symbol in the first column.
How many characters should a FAST input have?
It should not exceed 80 characters.
Can blank lines be entered into the FASTA format?
No.
What is the input sequence for the BARE format of a BLAST search?
Lines of sequencing data without the FASTA definition line.
What is the input sequence for the IDENTIFIER format of a BLAST search?
They are accession numbers, accession versions or gi’s.
These are sequence ID tags that the database has attached to a particular gene or protein.
What is sequence homology?
The analysis of DNA sequences from different organisms to determine the evolutionary relationships.
What is one flaw to sequence homology?
Bacteria can exchange DNA sequences via horizontal gene transfer.
What software is often used by scientists to investigate phylogenetic relationships?
Multiple sequence alignment software such as CLUSTAL and COBALT.
What do taxonomists create to show how closely different organisms are related?
Phylogenetic trees.