Bioinformatics Flashcards
Sanger sequencing
-good for short DNA sequences roughly 300-1000bp
-requires short primer to initiate DNA sequences
-can be completed in a few hours
commonly used data bases for annotated SDNA sequences
-Genbank
-RefSeq
-EMBL-EBI
-UnitProtKB
-Biocyc collection
fasta format
-text based format used for storing DNA, RNA and single letter aa sequence data
-consists of header and sequence
-top line is bolded
genbank format
-detailed descriptive record of an entry in the genbank data base
-default format that will appear when you search the database
-may include additional information aside from the name and sequence such as the original publication and its location on genome and plasmid, or metadata about the sameple where the sequence was obtained
BLASTn
-nucleotide sequence using nucleotide query sequence
-DNA to DNA
BLASTp
-general sequence ID and similarity searches of protein sequences
-protein to protein
-BLASTx
-identifying potential protein products encoded by nucleotide query
-DNA to protein
-tBLASTn
-identifying database sequences endofing proteins similar to the query
-protein to DNA
tBLASTx
-identifying nucleotide sequences similar to the query based on their coding potential
-DNA to DNA
caveats of CARD
-results from bioinformatics analyses will vary for every strain meaning out search result is only applicable to the genome we analyze and may not be true for other strains of the same species
-the results you obtain are putative (assumed) and not definitive, meaning that although the presence of and AMR gene suggests organism is resistant to an antibiotic, in vitro experiment is still required for functional validation
MSA`
first step in determining whether certain nucleotides or AA in a gene or protein are conserved across multiple strains or species
-first step for inferring the phylogenetic relationships between a set of sequences you seek to compare
phylogenetic trees
-predict evolutionary relationships that various species shar based on physical or genetic characteristics
-sequences are aligned, curated and organized into a tree based on various complex parameters
-needs to be in fasta format
-branch lengths directly relate to amount of genetic change