Databases Flashcards
Uniprot
protein
EMBL
gene
ENSEMBL
genome
NCBI
bacteria
methods for sequence comparison
- Diagonal plots
- FASTA
- BLAST
FASTA
speeding up alignments with hash tables, heuristic algorithm, usage of K-tuples to search for matching sequence patterns of K-tuple hits
BLAST
an algorithm for comparing primary biological sequence information, optimized for speed use.
blastn
compares your nucleotide sequence with database nucleotide sequence
blastp
compares your query protein sequence with database of protein sequence that were derived from cDNA of interest
blastx
first translates your sequence into amino acids in 6 reading frames then compares the protein sequences with protein databases
tblastn
compares your query protein sequence with the database after translating each nucleotide sequence into protein using all 6 reading frames
tblastx
translates both query nucleotide sequence & the database sequence in all 6 reading frames & then compares the protein sequence. looks for protein coding regions. Good choice- less noise
PROSITE
protein database. Its uses includes identifying possible functions of newly discovered proteins and analysis of known proteins for previously undetermined activity
what is PSI-BLAST
(position specific iterated BLAST)- iterative search using protein BLAST algorithm.
how is PSI-BLAST used
- a list of all closely related proteins is created
- these proteins are combined into a general “profile” sequence, which summarizes significant features present in their sequences
- a query against the protein database is then run using this profile. larger group of proteins is found
- this larger group- used to construct another profile-> process repeated
HMMR
software for working with sequence HMM (hidden markov models= generalization of protein models).
Pfam
protein family database. looks at domains & protein family definitions & HMM
MFFT & Clustal Omega
MSA program for amino acids or nucleotide sequence
programs for phylogenetic tree constructions
- clustal w: distance method
- Phylip/protpars: parsimony
- Tree, PaPa: progressive alignment, followed by parsimony
PDB (protein data bank)
s a repository for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids
PDB contents
x-ray structures
NMR structures
EM models
Models from predictions/modelling
3D structures (proteins) databases - hierarchal fold classification
SCOP (annotaed)
CATH (automated)
= both have classes, fold, superfamily, family
PROSITE & Pfram
sequence based classification of protein domains, families
ENZYME
enzyme nomenclature
BRENDA
nomenclature, isolation & purification, stability etc
KEGG
resource for understanding high level functions & utilities of the biological system
Programs defining secondary structures (protiens)
PSI-PRED
Predicting simple sequence features (proteins)
Signal P ( signal peptide) Target P (cellular localisation)
Ab initio homology modelling
Rosetta: generate the model by adding fragments together
software/tools for systems biology
- obtaining data sets from databases ( TCGA, cBioPortal)
- first analysis of data sets based on gene expression- MultiExp
- Networks- String, IMP
- Pathway analysis: PANTHER, DAVID, KEGG
- Metacore
HADDOCK
high ambiguity driven protein-protein docking
-> use of biochemical and/or biophysical interaction data
SwissPro
reviewed manually. high quality manually annotated & non-redundant protein sequence database
TrEMBL
unreviewed. contains protein sequences associated with computational generated annotation & large scale functional characterization