Bioinformatics (Week 2) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q
  1. GUI orientated programs
    - Graphical User Interface
    - Multifunctional
A

Graphical User Interface
• Easy to use
• No need to understand basic concepts to use
• NEED understand Basic concepts to properly utilize
• One command for Multiple algorithms or steps
• Visually orientated = quick view of multiple sets of data
• Good if your looking for patterns
• Publishable, quality images

Multi-functional
• Can contain a suite of programs
• Helpful when working with complex data/intricate question
• Use multiple formats
• Usually platform independent
• Most available for Mac, Win +Linux
• Mostly commercial software
• Can cost hundreds of thousands of Rands + patented code
• Restricted performance + ability
• Graphic rendering computationally intensive
• Graphic nature also limits certain functions or operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Command line (CLI) programmes
A

program that accepts text input to execute operating system functions.

  • Represents bulk of available software
  • Thousands of derivates for specific problems
  • Specialists + Multi-functional
  • Many are only focused on a single task while some are as program suits
  • Helpful when working with multiple sets of data from varying sources
  • Very specific file formats
  • Difficult to master
  • Lack of GUI intimidating and little support or help
  • True Open Source
  • Variations of programs pop-up overnight
  • Free!
  • Mostly Unix system dependent
  • Mac has moderate availability with almost none for Windows
  • Great processing usage
  • Focus on development is proper Disk and CPU usage
  • Difficult to interpret
  • Without the aid of visualization software, it is more difficult to properly visualize for publishing or reports
  • Data returned all text based + heavily reliant on user editing for analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Homologs, paralog and ortholog
A

Homologs:
- Protein/gene that shares common ancestor + which has good sequence and/or structure similarity to another

Paralog
- Homologue which arose through gene duplication in same species/chromosome

Ortholog
- Homologue which arose through speciation (found in different species)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Similarity and homology
A

Similarity

  • Likeness or % identity between 2 sequences
  • sharing a number of bases or amino acids
  • Does not imply homology
  • Quantifiable i.e., CAN sat x% similar

Homology

  • Shared ancestry
  • Derived from a common ancestral sequence
  • Implies similarity
  • Not Quantifiable i.e., NOT x% homologous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Global and local alignment
A

Global alignment:
- Attempts to align complete length of one sequence with complete length of the other
o Needleman-Wunsch (1970) algorithm

Local alignment:
- Attempt to find the longest stretches of highest similarity between the two sequences
o Smith-Waterman (1981)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Pairwise alignments
A
  • Describe percent identity 2 sequences share + % similarity
  • Score of a pairwise alignment includes positive values for exact matches, + other scores for mismatches and gaps
  • Based on a scoring matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. PAM and BLOSUM

• PAM10 and BLOSUM80
• PAM250 and BLOSUM30
scoring Matrix

A
  • PAM + BLOSUM scoring matrices provide rules for assigning scores.
  • PAM10 and BLOSUM80 = examples of matrices appropriate for comparison of closely related sequences.
  • PAM250 and BLOSUM30 are examples of matrices used to score distantly related proteins.

Scoring matrix
- look under objective for diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. What is BLAST, what is its main purpose and Types of BLAST
A

• BLAST (Basic Local Alignment Search Tool) allows rapid sequence
comparison of query sequence against a database.

MAIN purpose = infer homology

Types of BLAST (Diagram - objective)

  • Nucleotide-based BLAST
    * exact word match, one word match
  • Protein-based BLAST
    * neighborhood words, two word matches within 40 residues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Raw score (S)
A
  • Calculated as the sum of identities, substitution matrix + gap scores.
  • Substitution scores are given by a look-up table (PAM, BLOSUM)
  • Gap scores calculated as sum of G, gap opening penalty and L, gap extension penalty
  • For a gap length of n, gap cost = G + Ln
  • Usually a high value for G and lower value for L
  • Alignment specific
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. Bit score (S’)
A
  • Derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account
  • Bit score calculated based on frequency of a particular aligned character pair compared to frequency of the same character pair in a random sequence
  • Bit scores have been normalized with respect to coring system (normalized for “effective length“) + used to compare alignment scores from different searches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. E-value
A

• Significance of each alignment computed as E-value
Number of hits of score ≥ S expected by chance when searching given string in a database of a particular size

  • Based on random database of similar size
  • Lower means more significant indicating that the observed sequence similarity is unlikely to have arisen purely by chance
  • Used to assess statistical significance of alignment
  • E value is equivalent to standard P value
  • Significant if E < 0.001 (smaller numbers = more significant)

• A sequence alignment that has E-value of 0.001 means that this similarity has a 1 in 1000 chance of occurring by chance alone
OR
• in database of similar size that is the expected number of results that will have other alignments with similar or better S scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

11,1 E value depends on

A

(a. ) Similarity Score (Bit Score): Higher similarity score (e.g., high % seq id) = smaller E-value
(b. ) Length of the query: Similarity Score is more easily obtained by chance with a longer query sequence, longer queries = larger E-values
(c. )Size of the database: Since a larger database makes Similarity Score easier to obtain, larger database = larger E-values

  • very low E values (< e-100) = homologs or identical genes
  • moderate E values (~ e-50) = related genes
  • long list of gradually declining E values indicates large gene family
  • long regions of moderate similarity are more significant than short regions of high identity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly