Lecture 08 - Bioinformatics Flashcards
What is bioinformatics
the collection, classification, storage and analysis of biochemical and biological information using computers especially as applied to molecular genetics and genomics
What is the INSDC
international nucleotide sequencing database collective
Who makes up the INSDC
DDBJ, NCBI, ENA
What is the DDBJ
DNA data bank of Japan
What is the NCBI
national center for biotechnology information
What is the ENA
european nucleotide archive
What is the FASTA format
accession number, identifier, what kind it is, then sequence
What is FASTQ used for
next generation sequencing
What is found in the GenBank Header
locus, definition, accession, version, keywords, source, references
What information is found in the locus
- locus name
- length of sequence
- molecule type
-genebank division (3 letter code) - date last modified
What is in the definition
a brief description of the sequence (may include source organism and gene name)
What is the accession number
a unique identifier for the sequence within the database
What is the version number
it denotes any change to the sequence since it was first submitted
What is a reference sequence
high quality sequences that the NCBI have curated
What is the format for refSeq accession numbers
have an underscore
NM_
NC_
NG_
NR_
NZ_
What is the source
free-format information including an abbreviated form of the organsims name
What is the organism
the formal scientific name for the source organism
What does direct submission mean
the person who added the sequence and the date that the sequence was added to the database
What does CDS mean
coding sequence
What is an exon
definitive region of genome that codes for a portion of spliced mRNA, rRNA, or tRNA, may contain 5’UTR, all CDSs and 3’UTR
What is an intron
a segment of DNA that is transcribed but removed from within the transcript by splicing together the sequences on either side of it
What is a gene
a region of biological interest identified as a gene and for which a name has been assigned
What is a location
a site between two adjoining nucleotides such as a restriction enzyme site that is indicated by listing the two points separated by ^
What is a sequence span
indicated using the starting base number and the ending base number separated by two periods
</> symbols may be used with the starting and ending numbers to indicate that an end point is beyond the specified base numbers
What is a location operator
a prefix that specifies what must be done to the indicated sequence to find or construct the location corresponding to the feature
What are common operators
complement
join
What is complement
find the complement of the sequence and then present it in 5’ to 3’
What is join
the indicated elements should be joined to form one contiguous sequence
What is complment join
the indicated elements should be joined to form a contiguous sequence and then take the complement and place it in 5’ to 3’ orientation