Class 1 Flashcards
What is Entrez and what are some example databases under it?
Entrez: an integrated database retrieval system that provides access to a diverse set of 39 bases
ex: PUBMED, Nucleotide and Protein Structures, Complete Genomes, Taxonomy, SRA
What are accessions and what do they tell us?
Unique identifiers; a unique string associated with some data.
Conveys relationships between objects
Why are bioinformatic analyses build specific?
Every time a new reference build is made, the entire coordinates of the genome changes, therefore when sequences are aligned during the reference of a particular build, analysis is build specific.
Gene annotations are associated with specific genome builds
What are hardlinks? Give an example
Direct connections between entries in different databases
ex: a link to a paper describing a nucleotide sequence
What are neighbours in Entrez and what is something that one must be careful of?
Neighbours are indirect links/connections bewteen entries in different database that are based on similarity, but not identity.
Note that the definition of similarity for each database is subjective
Give examples of types of neighbours.
- 3D structures/similar structures (proteins w same 2’ structures)
- Similar papers in Pubmed (# of overlapping words
What is something that you should be careful of when finding similar sequences through BLAST?
The establishment of neighbour connections are qualitative and are subject to debate. Have to make decisions of appropriate cut-offs for e-values to decide what is included or not as neighbours. Connections are curated by individuals or based on principles
It is assumed that sequences with high sequence similarity often have related biological functions