Bioinformatics Flashcards by Michael Nel

Given gene how would you find info about it

Find sequence (EMBL, DDBJ)
Literature database
Genomics database (MIM)
Gene expression database (NCBIGEO)
Interaction databases (intact, BIND)
Metabolic pathway (ENZYME, KEGG, reactome)
Mutation/ polymorphism databases (dbSNP)

How well did you know this?

Not at all

Perfectly

What is a database

Data collection that is structured, searchable, updatable, cross-linked and publicly available.

How well did you know this?

Not at all

Perfectly

Why does BLAST work?

Similar sequences tend to have similar function

Similar sequences tend to be evolutionarily related

How well did you know this?

Not at all

Perfectly

How can you be sure our blast match is significant

E score (roughly equal to probability of chance)

E = mn2s
M - #nucleotides your sequence was compared against
N - #nucleotides in your sequence
2s - 2 to the power of match score (smaller as sequence get more similar.

How well did you know this?

Not at all

Perfectly

Blast aa

Blast nucleotide

BLASTp

BLASTn

How well did you know this?

Not at all

Perfectly

What is BLASTx

BLAST a translated nucleotide sequence in all 6 frames against aa sequence database

How well did you know this?

Not at all

Perfectly

What does tBLASTn do

BLASTs aa sequence against nucleotide sequence that has been translated in all 6 frames

How well did you know this?

Not at all

Perfectly

What does tBLASTx do

Your nucleotide in six frames translated into aa against database nucleotides in six frames translated into aa

Good for distantly related sequences.

How well did you know this?

Not at all

Perfectly

MegaBLAST

Quicker than BLASTn but less sensitive

Use this for everything unless looking for distantly related sequences (use tBLASTx for that)

How well did you know this?

Not at all

Perfectly

PSI-BLAST

Very sensitive blast that takes into account that some regions are more conserved than others. Takes LONG.

How well did you know this?

Not at all

Perfectly

What is special about multiple sequence alignments

Can reveal subtle conservation of genome features as these areas evolve/change slower. >3 sequence alignments can show evolutionary relationships.
Eg. Demographic and ecological histories of pops - gene flow, size changes, nat selection, migrations.

How well did you know this?

Not at all

Perfectly

Local vs global alignments

Global - end to end alignments

Local - specific regions of sequence

How well did you know this?

Not at all

Perfectly

Common mismatch scoring schemes

Nucleotide mismatch 
Aa mismatch (BLOSSM, PAM)

How well did you know this?

Not at all

Perfectly

How are most multiple alignments done

Build multiple alignments from pair wise alignments. Use mismatch scores to find best score. Use a technique called Dynamic Programming.

How well did you know this?

Not at all

Perfectly

Pair wise alignment methods

ClustlW - global alignment 20kb long
MUSCLE - global and local 100kb long
MAUVE - global 10Mb long

How well did you know this?

Not at all

Perfectly

Uses of sequence databases in bioinformatics

Retrieve known gene sequence
Finding info on gene
Compare sequence to others in DB
Submit sequence to be stored with rest
Find how many genes an organism has

How well did you know this?

Not at all

Perfectly

Why is it harder to do gene prediction in humans vs bacteria

Bacteria have specific and well understood proctor sequences (easy to identify) Protein coding sequences one contiguous ORF.

Human promotors less well understood and complex (harder) Protein coding is divided into exons and spliced variably.

How well did you know this?

Not at all

Perfectly

Why want to know GC content of sequence

Higher GC generally = longer protein coding region.
Melting temp for PCR.
Different orgs have varying GC content
Useful in mapping exon rich regions

How well did you know this?

Not at all

Perfectly

Which genes are more homologous this or that

You can’t quantify homology. It is a conceptual framework to define the evolutionary relationship between two genes. You can quantify similarity. If they come from dif species you can look at orthology.

How well did you know this?

Not at all

Perfectly

Why bioinformatics needed

Small and large scale analysis
New lab techniques
Single -> whole genome
Collection/storage of data
Manipulation of data

How well did you know this?

Not at all

Perfectly

Egs of sequence databases

EMBL
DDBJ
GeneBank

How well did you know this?

Not at all

Perfectly

What do genomics databases contain

Info about gene chromosomal location
Nomenclature
Links to sequence databases

Eg MIM

How well did you know this?

Not at all

Perfectly

What is an isoform

Study These Flashcards

Alternative to a sequence

Egs of gene expression databases

Study These Flashcards

NCBIGEO

How to remove vector sequence from DNA sample sequence

Run against vector sequence database eg. UniVec

How to chose most likely translation result

``` Usually longest ORF Starting with Met Ending in stop No stops wonton sequence Confirm with promoter prediction ```

Egs of gene prediction software

GeneMark | GENSCAN

Translators and promoter prediction software

NCBI ORF Finder | Promotor 2.0 prediction server

Protein sequence databases

UniProt GenPept RefSeq

Database of 3D structures

Protein Data Bank (PDB)

Protein domain / family databases integrated into what site

InterPro

What is a motif

Sequence of aa encoding for a certain molecular function ``` Short = motif Long = functional domain ```

Short linear motifs

Unrelated proteins sharing a functional feature like to contain similar motifs Etc

Classification of motifs

Modification Ligand Targeting Cleavage

What is a regular expression

Determines what aa is allowed in each position | Used by PROSITE

BioEdit analysis for cloning

``` Nucleotide composition Six frame translation Determine ORF Length of insert/DNA RE mapping ```

Transition vs transversion

Transition is purine to purine or pyrimidine to pyrimidine (eg A to G , T to C) Transversions are opposite (twice as many transversions possible but twice as many transitions occur)

Types of sequence formates

Fasta Genbank Nexus Phylip

Types of sequence viewers

Sea view Aliview Mesquite MEGA

What is an open reading frame?

A string of in-frame codons that specify an amino acid Starts with ATG (meth) or Val Ends with stop codon

Gene prediction software

GeneMark GENSCAN microbial Gene Prediction Systm Glimmer

What are promoters?

DNA sequence involved in regulating transcription

Types of promoters

- core - proximal - distal

Functions of promoters

- integrate info about cell conditions and alter rate of transcription in response - different components responsible for different parts of expression pattern

Tasks of bioinformatics

- identify promoter regions - find TFBS and TFBS modules in a sequence - discover novel TFBS motifs - construct TFBS and their motifs - analysis of expression data

How to represent TFBS motifs

- consensus sequence | - position weight matrix

Databases of TFBS motifs

Transfac | Jaspar

What is phylogenetic foot printing?

Use of comparative genomics to infer functional genomic regions from conservation

What does phylogenetic foot printing require?

- comparison of correctly identified orthologous promoter regions - conserved function across species - species sufficiently diverged to reduce passive conservation

POSSUM workflow

- set of co-expressed genes - automated sequence retrieval from ensembl - phylogenetic foot printing - detection of TFBS - statistical significance of binding sites

What are methods of miRNA identification based on.

- targets tend to be located in 3'UTR | - some are complementary to the target RNA

What is a motif ?

A sequence of amino acids encoding a particular molecular function

What is PROSITE

Library of regular expressions describing each enzyme active site

Advantages of regular expressions

- memorable to humans - computationally fast - standardized in scripting languages - can describe a motif very well

Disadvantages of regular expressions

- over predict - motif may vary in other lineages - do not capture weaker preferences - easy to make poor representation

Example methods if protein functional domains

Matrix/profile Hidden Markov model Sequence clustering

Bioinformatics Flashcards

(56 cards)