Databases Flashcards by jackieee m

Uniprot

protein

How well did you know this?

Not at all

Perfectly

EMBL

gene

How well did you know this?

Not at all

Perfectly

ENSEMBL

genome

How well did you know this?

Not at all

Perfectly

NCBI

bacteria

How well did you know this?

Not at all

Perfectly

methods for sequence comparison

Diagonal plots
FASTA
BLAST

How well did you know this?

Not at all

Perfectly

FASTA

speeding up alignments with hash tables, heuristic algorithm, usage of K-tuples to search for matching sequence patterns of K-tuple hits

How well did you know this?

Not at all

Perfectly

BLAST

an algorithm for comparing primary biological sequence information, optimized for speed use.

How well did you know this?

Not at all

Perfectly

blastn

compares your nucleotide sequence with database nucleotide sequence

How well did you know this?

Not at all

Perfectly

blastp

compares your query protein sequence with database of protein sequence that were derived from cDNA of interest

How well did you know this?

Not at all

Perfectly

blastx

first translates your sequence into amino acids in 6 reading frames then compares the protein sequences with protein databases

How well did you know this?

Not at all

Perfectly

tblastn

compares your query protein sequence with the database after translating each nucleotide sequence into protein using all 6 reading frames

How well did you know this?

Not at all

Perfectly

tblastx

translates both query nucleotide sequence & the database sequence in all 6 reading frames & then compares the protein sequence. looks for protein coding regions. Good choice- less noise

How well did you know this?

Not at all

Perfectly

PROSITE

protein database. Its uses includes identifying possible functions of newly discovered proteins and analysis of known proteins for previously undetermined activity

How well did you know this?

Not at all

Perfectly

what is PSI-BLAST

(position specific iterated BLAST)- iterative search using protein BLAST algorithm.

How well did you know this?

Not at all

Perfectly

how is PSI-BLAST used

a list of all closely related proteins is created
these proteins are combined into a general “profile” sequence, which summarizes significant features present in their sequences
a query against the protein database is then run using this profile. larger group of proteins is found
this larger group- used to construct another profile-> process repeated

How well did you know this?

Not at all

Perfectly

HMMR

software for working with sequence HMM (hidden markov models= generalization of protein models).

Pfam

protein family database. looks at domains & protein family definitions & HMM

MFFT & Clustal Omega

MSA program for amino acids or nucleotide sequence

programs for phylogenetic tree constructions

clustal w: distance method
Phylip/protpars: parsimony
Tree, PaPa: progressive alignment, followed by parsimony

PDB (protein data bank)

s a repository for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids

PDB contents

x-ray structures
NMR structures
EM models
Models from predictions/modelling

3D structures (proteins) databases 
- hierarchal fold classification

SCOP (annotaed)
CATH (automated)
= both have classes, fold, superfamily, family

PROSITE & Pfram

sequence based classification of protein domains, families

ENZYME

enzyme nomenclature

BRENDA

nomenclature, isolation & purification, stability etc

KEGG

resource for understanding high level functions & utilities of the biological system

Programs defining secondary structures (protiens)

PSI-PRED

Predicting simple sequence features (proteins)

``` Signal P ( signal peptide) Target P (cellular localisation) ```

Ab initio homology modelling

Rosetta: generate the model by adding fragments together

software/tools for systems biology

1. obtaining data sets from databases ( TCGA, cBioPortal) 2. first analysis of data sets based on gene expression- MultiExp 3. Networks- String, IMP 4. Pathway analysis: PANTHER, DAVID, KEGG 5. Metacore

HADDOCK

high ambiguity driven protein-protein docking | -> use of biochemical and/or biophysical interaction data

SwissPro

reviewed manually. high quality manually annotated & non-redundant protein sequence database

TrEMBL

unreviewed. contains protein sequences associated with computational generated annotation & large scale functional characterization