glossary Flashcards
ab initio gene discovery
- this is a method for identifying genes in a sequence when you don’t have prior information about the gene (lack information about comparisons with other species and gene transcript product)
- most ab initio approaches use a hidden Markov model to search for sequence motifs that are commonly found in genes, such as long open reading frames, intro-exon boundary signatures, and conserved upstream regulatory motifs
ab initio protein structure prediction
- finding the tertiary structure of a protein (helices, sheets, coil foldings)
- some alternative methods include X-ray crystallography, NMR spectroscopy, fitting model by homology
acrylamide
- compound used to make gels for electrophoresis (separation of proteins or nucleic acids)
affinity chromatography
- method for purifying proteins and their complexes based on their affinity for some compound
- this compound is crosslinked to a matrix in a column
- proteins eluted when buffer disrupts interaction between proteins on the column
alignment
- lining up two or more DNA/protein sequences
- maximizing # of identical nt/residues
- minimizing # of mismatches and gaps
alternative splicing
- combination of different sets of exons to make two or more mature mRNA from the same primary transcript
- observed in higher eukaryotes
- single gene can create multiple protein isoforms
annotation
- linking information from literature to databases for genes/ proteins
- in genome sequencing, annotation refers to the identification of likely genes using a combination of ab initio methods, homology searches and physical evidence
antibody
- secreted immunoglobulin molecule
- recognizes up to 10 aa (aka epitope)
- poly clonal Ab = group of different Ab recognizing different epitopes on the same protein
- monoclonal Ab = recognize single epitopes and made by hybridoma cell lines
association mapping
- search for genes that affect disease susceptibility
- done by testing alleles at DNA polymorphisms and seeing if they are present in affected individuals more or less commonly than expected by chance
- LD complicates and helps the mapping process
balancer chromosomes
- chromosomes that have been engineered to contain multiple inversions that suppress crossing over
- these are used to maintain recessive mutations in genetic stock
- balancers usually have a recessive lethal marker and a dominant visible genetic marker
base calling
- process of calling series of nt from a sequence trace
- usually automated, but manual work can resolve ambiguities
bioconductor project
- project using open-source software written in the R programming language
- used for statistical analysis of genomic data
case-control association mapping
- screening genetic markers that are associated with disease status
- based on comparing allele frequences in a group of affected people and in a similar control group of unaffected ones
cDNA
- DNA that is complementary to mRNA
- first strand synthesis of cDNA is made by reverse transcriptase
- cDNA can also be ds
cDNA clone
- complementary DNA copy of a full length transcript
cDNA library
- collection of cDNA clones (usually isolated from a single tissue)
cDNA microarray
- an array of cDNA on a glass microscope slide of nitrocellulose filter
- they are hybridized to labeled mRNA for profiling gene expression
centriMorgan (cM)
- standard unit of genetic map distance
- it corresponds to a 1% probability of a crossover occurring between two sites in any meiosis
chain termination sequencing
- most commonly used method for sequencing DNA clones (up to 1 kb)
- based on a method first devised by Fred Sanger
- molecules of all possible lengths are made by random termination of DNA polymerization when a dideoxynucleotide (ddNTP) is incorporated
chemometrics
- series of analytical methods for quantifying chemical profiles
- this includes principle component analysis (PCA) and artificial neural networks
chemostat
- apparatus used for long-term exponential growth of microbial cultures
- fresh medium is introduced at the same time as liquid culture waste is removed
chromatin immunoprecipitation microarrays (ChIP chips)
- microarrays consisting of DNA that corresponds to potential regulatory regions of genes
- this is used to detect sequences that bind to transcription factors
chromosome painting
- this procedure aligns the chromosomes of two different eukaryotic species
- based on fluorescence in situ hybridization (FISH)
- a set of chromosomes-specific probes from one species are made using unique combos of fluorescent dyes
- these probes paint the chromosomes in a mitotic chromosome spread from cells of the second species
chromosome walking
- this procedure clones a large contiguous portion of a chromosome
- a probe at end of one clone is used to identify overlapping genomic clones in a library
- procedure is repeated until region of interest is covered
clusters of orthologous genes ( COGs)
- sets of genes from a collection of species
- they are hypothesized to encode the same gene product
- determined by pairwise best-match sequence similarity
complementation group
- a set of alleles that fail to complement ( substitute for the function of) on another
- this usually indicates that they are mutations in the same locus
consensus sequence
- a hypothetical sequence that has the most common amino acid at each position in a multiple alignment of DNA or protein sequences
- aka amino acid or DNA sequence
copy number variation (CNV)
- polymorphism in the number of copies of a stretch of DNA
- this includes deletions and duplications of whole genes
contig
- a contiguous stretch of cloned DNA
- may refer to:
1) a scaffold of overlapping clones (physically mapped)
2) a long stretch of DNA sequence assembled by merging two or more sequences
cosmid
- large insert plasmids
- usually exist as a single copy within host bacterial cells
- contain cos sites that allow in vitro packaging of inserts as phase molecules if desired
CpG islands
- stretches of vertebrate DNA
- usually 1-2kb long
- contain a 10x higher frequency of doublet nucleotides CG than entire genome
- usually found near the 5’ end of genes
Cre-Lox recombination system
- a combo of site specific recombinase (Cre) and its recognition site (lox) from the bacteriophage P1
- engineered into yeast, mouse, and other eukaryotic genomes to facilitate targeted recombination
C-Value paradox
- no apparent correlation between the number of genes and the amount of DNA in a genome
- there is a range of DNA content even in closely related organisms
- no relationship between complexity and DNA content
cytological map
- map of the location of genes or other DNA features relative to the banding patterns of the chromosomes of a species
data normalization
- process of removing systematic biases from microarray data
- these biases cause misinterpretation of apparent differences in transcript abundance
deficiency complementation mapping
- this method is for fine scale mapping of QTL based on the variable ability of WT alleles to complement the effects of hemizygotes for a deletion of a gene or genes
- how a WT reacts to hemizygotes when gene is missing
dideoxynucleotide
- a nucleotide without OH at both 2’ and 3’ carbon of the sugar backbone
- cannot covalently link to the next nucleotide in a growing DNA
- used in chain termination sequencing
DNA binding motif
- short stretch of DNA (8-12 nt) that can be recognized by a DNA binding protein
- motifs can be represented by a profile of frequency of each dNTP
- these motifs help identify sequences important for gene regulation
DNA library
- collection of clones where each piece contains a different segment of genomic of cDNA
- if clone can be transcribed and translated, it’s an expression library
ectopic expression
- activation of expression of a gene in a cell that is not usually expressed (abnormal expression)
- can be done artificially or by disease
embryonic stem (ES) cells
- this cell line an be transformed and manipulated in culture
- then injected into the blastula (early embryo) where it integrates with and grows to contribute to the development of the adult animal
- injected embryos are chimeric
- if ES cells populate in the germ line, a transgenic organism is produced in the next population
enhancer
- orientation and distance independent regulatory sequence
- increase transcription levels and can occur anywhere in a genome
- can act over 100kbps
- one enhancer can affect the transcription of several genes
enhancer trap
- this transposable element is modified with a reporter gene
- when inserted into the genome adjacent to a gene, the enhancer that drives expression of that gene also drives expression of a reporter gene
Ensembl gene browser
- storage and resource for genomic data in Europe
- run by Sanger Centre (Cambridge) and European Bioinformatics institute (within the European Molecular Biology Lab)
epistasis
- two definitions
- (Quantitative genetics) an interaction between two or more loci that results in non additive effects of one allele as a function of the genotype at the other locus
- (developmental/physiological genetics) describing a mutation whose phenotype is unaffected by another mutation
epitope
- portion of a protein, carb, or other molecule that is specifically recognized by an antibody
E-value
- expected number of sequences in a database that would by chance produce an equivalent or better alignment score than the one under consideration
expressed sequence tag (EST)
- sequenced piece of cDNA (subsequence)
- full-length cDNA defines structure of transcript, but EST is a tag that indicates that the particular sequence is part of a transcribed gene
- (online definition) EST is a tiny portion of an entire gene that can be used to help identify unknown genes and to map their positions within a genome
expression library
- a library of cDNA clones in a vector that allows the gene products to be expressed (transcribed and translated) in a controlled manner
expression vector
- a cloning vector that allows transcription and translation of a cDNA fragment that is inserted into the multiple cloning site
expressivity
- the severity of a disease OR
- the degree to which a trait is observed in affected individuals
- often affected by the environment
F3 design
- a genetic screen designed to isolate recessive mutations
- requires that the phenotype be measured in F3 progeny of the mutagenized individual
floxing
- a method for inducing a mutation at a precise time and place in an organism
- when a mouse has loxP binding sites on both sides of an exon of the gene to be mutated (placed by homologous recombination)
- this is crossed to a strain containing Cre recombinase in the tissue of interest
- exon is excised only in that tissue
fold recognition
- method for predicting the tertiary structure of a protein
- secondary structure is predicted by using limited sequence similarity and comparing it to find the previously described domain fold that most closely fits the unknown protein structure
forward genetics
- genetic analysis that starts with the phenotype and moves towards isolation of gene that causes the phenotype
- phenotype => gene
functional genomics
- study of the function of each and every gene
- (ie) biochemical activity, cell biology function, organismal function
- this includes genetic analysis, microarrays, proteomics, and computational biology
fusion protein
- hybrid protein made by fusion of two genes in an expression vector
- N terminal = tag (poly histidine, small glutathione s transferase)
C terminal = protein of interest
GAL4
- a potent transcription factor from yeast that enhances gene expression only though a UAS sequence adjacent to the promoter
- if no UAS sequence, GAL4 will have no effect on transcription in heterologous genomes
- GAL4-UAS system is specifically used to drive expression of transgenes introduced into that genome
gene knock-in
- replacement of the endogenous gene with a different functional piece of DNA
- inserted gene is expressed in place of the original gene
- germline gene therapy uses gene knock-in to replace a defective gene with an active copy
- the replacements are performed using positive-negative double selection strategy in ES cells
gene knock outs
- a mutation that targets a specific gene, made by using homologous recombination to replace exon of the target gene with a piece of foreign DNA (lacZ reporter gene)
- insertional mutations can also cause gene knock outs
genetic fingerprinting
- strategy for testing subtle effects of mutations on the fitness of microbial strains in competition with other strains during long-term culture
genetic heterogeneity
- the observation that the same disease or phenotype can have multiple different genetic causes
- allelic heterogeneity = if different variants are within a single locus
genetic map
- in cM (centiMorgans)
- map of the order of and distance between genes based on recombination frequency between markers
- markers can be physical (molecular variants) or visible (mendelian loci)
- mapping populations may be pedigrees, crosses between lines, or radiation hybrid cell panels
genome-wide association study (GWAS)
- a study designed to scan the entire genome for SNP and CNV that are associated with a disease or trait
- at least 500k different genetic variants are measured in several 1000 disease cases and a similar number of healthy controls
germ line
- the population of cells in eukaryotes that are destined to undergo meiosis to become oocytes or sperm
- the germ line is set aside very early in animal development
- in plants, the germ line is specified at the time of flowering
haplotype
- multi-site genotype of two or more polymorphisms on the same chromosome
- (ie) individuals who are homozygous at one site for G allele and heterozygous at a nearby site for A and T, the individual would have GA and GT haplotypes
Hardy-Weinburg equilibrium
- expectation that genotype frequencies in a population will tend to be stable and predictable as a simple function of individual allele frequencies
- equilibrium is broken by evolutionary forces (such as migration, inbreeding, mutation or selection)
- these forces can lead to increase or decrease in the number of heterozygotes
heavy isotope labeling
- method for quantifying protein expression between two samples
- one protein is labeled with heavy isotope (deuterium) so peptide piece moves slower through TOF spectrometer than the corresponding unlabeled fragment
- ICAT reagents are used for uniform labeling of protein mixes after cell extraction
heteroduplex DNA
- dsDNA containing a polymorphism
- formed by renaturing PCR products from two different alleles
heuristic search
- algorithms that use time-saving methods to search for the most likely solution
- reduce search space by excluding unlikely solution from the analysis
- not guaranteed to find the optimal solution, but it’s often the only way to perform phylogenetic analysis or sequence alignment involving a large number of sequences