Lecture 08 - Bioinformatics Flashcards
What is bioinformatics
the collection, classification, storage and analysis of biochemical and biological information using computers especially as applied to molecular genetics and genomics
What is the INSDC
international nucleotide sequencing database collective
Who makes up the INSDC
DDBJ, NCBI, ENA
What is the DDBJ
DNA data bank of Japan
What is the NCBI
national center for biotechnology information
What is the ENA
european nucleotide archive
What is the FASTA format
accession number, identifier, what kind it is, then sequence
What is FASTQ used for
next generation sequencing
What is found in the GenBank Header
locus, definition, accession, version, keywords, source, references
What information is found in the locus
- locus name
- length of sequence
- molecule type
-genebank division (3 letter code) - date last modified
What is in the definition
a brief description of the sequence (may include source organism and gene name)
What is the accession number
a unique identifier for the sequence within the database
What is the version number
it denotes any change to the sequence since it was first submitted
What is a reference sequence
high quality sequences that the NCBI have curated
What is the format for refSeq accession numbers
have an underscore
NM_
NC_
NG_
NR_
NZ_