L4 - Bioinformatics 1 Flashcards
what are bioinformatics
bridges the gap between BIG daya sets ajnd actual biological understanding
overview of bioninformatixs proscess
have a query/question
probe database
evaluate results
2 types of nucleotide databases
primary:
experimental data depoisioted direcetly by scientists
= GenBank
secondly:
info from a primary database but processed
= RefSeq
name a primary and secondary database
Genbank:
interbational database from EMBL,NCBI and Japan.
can have multiple copies of the same nucleotide sequnce each with a ‘UNIQUE’ accession number
RefSeq:
manually curated database from GenBank
what is UniProtKB comprised of
TreEMBL:
protein sequences automatically annotated by computer from nucletide sequnces = unrevewed and redundant
Swiss-Prot:
manual,high quality annotatuion,reviewed, non-redundant gold standard.
non-redundant = 1 record per molecule per species for fully sequnced organisms
what else does Uniprot do
cross-referebce and link to other resources
useful entry point to start investigating a protein
what is FASTA format
displays both DNA and protein sequnces without spaces
commonly usedf for analysis programmes
the linking of sequnce , structure and function
amino acid sequnce determines the structure which in turn dictates a proteins function
Homologous proteins share conserved amino acid patterns adopting similar folds with related functions
= we can use this to predict the function of a protein if we know its amino acid sequnce
what are protein domains
structural units of abount 50 amino acids
proteins can contain multiple domains
what is InterproScan
comapres query sequence to all other sequnces
assigns probability for any amino acid at a particular position within a domain whether they are identical or non-conserved
a threshold score determines if a certain domain is likely to be present
= InterproScan does all this and tells you which part of your protein is likely to have which domains
what is Prosite
identifies post-translational modifications
finds patterns using short mtofs linked ton modifications
generates hypothesis about your protein that you can then go and test