Bioinformatics Flashcards
What are some of the uses of bioinformatics?
- next generation sequencing
- gene expression analysis
- microarrays
What is sequence identity?
A perfect match between an unknown sequence and a known sequence.
What is sequence homology?
A partial match between an unknown sequence and a known sequence.
Give 3 protein sequence databases. Do they have cross collaboration?
UniProtKB, UniProtKB/TrEMBL and RefSeqP - no cross collarboration.
Give 3 DNA sequence databases. Do they have cross collaboration?
ENA/EMBL, Genbank and DDBJ- cross collaboration.
What does Ensembl do?
Provides annotations of numerous genomes, including their protein products.
What is a pairwise sequence alignment?
All possible alignments between two sequences are checked for sequence homology.
What is a global alignment used for?
Looking for homology between the same protein from different species.
What is a local alignment used for?
To match cDNA to genomic DNA.
To align two different protein sequences that share a common domain.
What is a global alignment?
Aligns the length of two sequences. Any homologous sequences can be aligned globally as long as they are similar enough.
What is a local alignment?
Alignment of two sequences such that homologous subsequences are aligned in between regions of non-related and unaligned sequences.
Why would gaps be introduced?
In order to produce the best possible global alignment.
What is the fasta format?
Most commonly used format for sequences.
> followed by a description of the sequence and its accession number
What is queried by blastn and in which database?
Nucleotide query, nucleotide database.
What is queried by blastp and in which database?
Amino acid query, amino acid database.
What is queried by tblastn and in which database?
Amino acid query, translated nucleotide database.
What is queried by blastx and in which database?
Translated nucleotide query, amino acid database.
What is queried by tblastx and in which database?
Translated nucleotide query, translated nucleotide database.
Which BLAST searches are protein searches?
Blastp and blastx.
What is the score?
Calculated by increasing the score for matches/similarities and decreasing for mismatches/gaps.
What are identities?
The number of residues that are identical in the alignment.
What are positives?
The number of similar residues in the alignment.
What does gaps mean in a BLAST output?
The number of gaps in the alignment.
What is the E (expect) value?
A measure of how reliable the alignment is.
How can sequences be considered homologous?
> 25% identity in amino acids sequence or >75% identity in nucleotide sequence, for sequences larger than 100 amino acids.
What is Clustal?
A multiple sequence alignment program.
What is considered a good alignment?
- at least 10-30 residues long
- have at least 1-3 stars
- have 5-7 colons
- have a few periods
What does * represent in a Clustal output?
An entirely conserved column.
What does : represent in a Clustal output?
A column where all of the residues have roughly the same size and hydrophobicity.
What does . represent in a Clustal output?
A column where the size and hydrophobicity has been conserved over the course of evolution.
Why are multiple sequence alignments more informative than pairwise alignments?
They have a lower % identity.
What does a phylogenetic tree show?
The evolutionary relationship between species or sequences.