Lecture 9 Flashcards
Define Bioinformatics (include the fields it encompasses)
Bioinformatics: an interdisciplinary field that uses computational tools for understanding biological data.
Bioinformatics combines biology, computer science, information engineering, math, and statistics to interpret biological data
Define Proteomics and Proteome
Proteomics: The study of proteins
Proteome: the entire set of proteins produced by an organism
State the 2 functions of Ang (angiogenin)
hydrolyzes RNA’s
interacts with DNA causing a promoter-like increase in the expression of rRNA
Describe on a molecular level, how Angiogenin interacts with DNA to enhance the expression of rRNA
it enhances rRNA transcription by binding to the CT-rich angiogenin binding element (ABE) within the upstream intergenic region of rDNA
Define homologs (homologous proteins)
Homologs: 2 molecules that are descended from a common ancestor
Compare Paralogs and Orthologs
Paralogs: homologs present WITHIN ONE SPECIES that have a common origin (a duplication event) but may have evolved different functions
(so paralogs may have similar structure but different functions)
Orthologs: homologs that are present in DIFFERENT species and that have similar functions
(more like identical twins)
Describe the process of sequence alignment and state why it is useful
Sequence Alignment: a process that systemically aligned sequences in order to search for similarities
Sequence comparisons (conducted via sequence alignment) can rule out the possibility that the similarities between samples are due to chance
True or False:
Sequence identities can be established by sliding one sequence past the other and counting the number of matches. explain.
True
while there are now more efficient ways to find these similarities, this method can also find similarities between sequences
(Myoglobin and Alpha-hemoglobin are 25.9% identical and many of these similarities were identified via the “sliding method”)
Introducing “gaps” into one of the sequences has been found to create better alignments between the sequences. What is a common issue with the gap introduction method? how do “scoring systems” account for this issue with the introduction of gaps?
the use of gaps may generate artificial similarities
scoring systems give 10 points for an assigned match between sequences and 25 points are DEDUCTED for a gap
Describe how the statistical significance of alignements between sequences can be estimated by shuffling.
Basically, if you compare the score you get after randomly shuffling the sequences to the score you got from the original alignment, you can determine the if they alignments were due to chance or actually significant
(if the original score is not sufficiently different from the randomized score, the original alignment could be a result of chance)
Describe how distant evolutionary relationships can be detected through the use of the following substitution matrices
More sensitive scoring system:
Conservative substitution:
Non Conservative substitution:
More sensitive scoring system: takes into account the degree of similarity of AA’s
Conservative substitution: replaces one AA with a similar one
Non Conservative substitution: replaces an AA with another AA with different chemical properties
AA substitutions can also be classified by what?
AA substitutions can be classified by the fewest number of nucleotide changes to achieve the AA substitution
Describe the scoring system of a substitution matrix (such as Blossom-62)
Blosum62 is a scoring system that awards points for substitutions that are commonly found in nature and subtracts points for substitutions that rarely occur
What does the substitution matrix reveal about alpha-hemoglobin and myoglobin?
The substitution matrix reveals that many of the differences between alpha-hemoglobin and myoglobin are conservative
True or False:
Substitution matrices can reveal homologies that are not identified by sequence alignments only. explain.
true
Describe what a positive and negative score from a substitution matrix such as Blossum-62 means.
a positive score indicates that a conservative substitution occurred (lysine for arginine)
A negative score indicates that a nonconservative substitution occurred (lysine for tryptophan)
Describe what BLAST searches are
BLAST (Basic Local Alignment Search Tool) is basically a method that searches a huge database and yields a list of sequence alignments accompanied by an estimate that the alignments occured by chance
True or False:
Primary structure is more conserved than tertiary structure. explain.
False
Tertiary structure is more conserved than primary structure.
(on the basis of 3D structure, Actin and HSP70 are paralogs, despite their very different functions)
Over time, what is more conserved? a protein’s structure or it’s sequence?
a protein’s structure
How can sequences be used to ID evolutionary relationships?
similar sequences between organisms, verified by scoring systems of course, can ID that they may have evolved from the same organism
Describe the UCSC genomics database
It is a genome browsing database that is hosted by the university of California Santa Cruz
It features the genomes of 46 vertebrates and can basically make comparisons and correlations between them, based on the search you input
Define medical informatics
using information to make medical decision making
AI being used in medicine is a great example of this
Define ClinPhen AND describe how it is used (include how HPO terms are invovled)
ClinPhen extracts and prioritizes pt phenotypes directly from medical records in order to expedite the genetic disease diagnosis process.
It does this using HPO (human phenotype Ontology) terms
Describe GARD
Genetic and Rare Disease Info Center = GARD
it is basically a worldwide network that hopes to be able to make diagnoses more quickly/accurately by compiling all known info from around the world in one place online