genomic vocab glossary Flashcards
ab initio gene discovery
- this is a method for identifying genes in a sequence when you don’t have prior information about the gene (lack information about comparisons with other species and gene transcript product)- most ab initio approaches use a hidden Markov model to search for sequence motifs that are commonly found in genes, such as long open reading frames, intro-exon boundary signatures, and conserved upstream regulatory motifs
ab initio protein structure prediction
- finding the tertiary structure of a protein (helices, sheets, coil foldings)- some alternative methods include X-ray crystallography, NMR spectroscopy, fitting model by homology
acrylamide
- compound used to make gels for electrophoresis (separation of proteins or nucleic acids)
affinity chromatography
- method for purifying proteins and their complexes based on their affinity for some compound- this compound is crosslinked to a matrix in a column- proteins eluted when buffer disrupts interaction between proteins on the column
alignment
- lining up two or more DNA/protein sequences- maximizing # of identical nt/residues- minimizing # of mismatches and gaps
alternative splicing
- combination of different sets of exons to make two or more mature mRNA from the same primary transcript- observed in higher eukaryotes- single gene can create multiple protein isoforms
annotation
- linking information from literature to databases for genes/ proteins- in genome sequencing, annotation refers to the identification of likely genes using a combination of ab initio methods, homology searches and physical evidence
antibody
- secreted immunoglobulin molecule- recognizes up to 10 aa (aka epitope)- poly clonal Ab = group of different Ab recognizing different epitopes on the same protein- monoclonal Ab = recognize single epitopes and made by hybridoma cell lines
association mapping
- search for genes that affect disease susceptibility- done by testing alleles at DNA polymorphisms and seeing if they are present in affected individuals more or less commonly than expected by chance-LD complicates and helps the mapping process
balancer chromosomes
- chromosomes that have been engineered to contain multiple inversions that suppress crossing over- these are used to maintain recessive mutations in genetic stock- balancers usually have a recessive lethal marker and a dominant visible genetic marker
base calling
- process of calling series of nt from a sequence trace- usually automated, but manual work can resolve ambiguities
bioconductor project
- project using open-source software written in the R programming language- used for statistical analysis of genomic data
case-control association mapping
- screening genetic markers that are associated with disease status- based on comparing allele frequences in a group of affected people and in a similar control group of unaffected ones
cDNA
- DNA that is complementary to mRNA- first strand synthesis of cDNA is made by reverse transcriptase- cDNA can also be ds
cDNA clone
- complementary DNA copy of a full length transcript
cDNA library
- collection of cDNA clones (usually isolated from a single tissue)
cDNA microarray
- an array of cDNA on a glass microscope slide of nitrocellulose filter- they are hybridized to labeled mRNA for profiling gene expression
centriMorgan (cM)
- standard unit of genetic map distance- it corresponds to a 1% probability of a crossover occurring between two sites in any meiosis
chain termination sequencing
- most commonly used method for sequencing DNA clones (up to 1 kb)- based on a method first devised by Fred Sanger- molecules of all possible lengths are made by random termination of DNA polymerization when a dideoxynucleotide (ddNTP) is incorporated
chemometrics
- series of analytical methods for quantifying chemical profiles- this includes principle component analysis (PCA) and artificial neural networks
chemostat
- apparatus used for long-term exponential growth of microbial cultures- fresh medium is introduced at the same time as liquid culture waste is removed
chromatin immunoprecipitation microarrays (ChIP chips)
- microarrays consisting of DNA that corresponds to potential regulatory regions of genes- this is used to detect sequences that bind to transcription factors
chromosome painting
- this procedure aligns the chromosomes of two different eukaryotic species - based on fluorescence in situ hybridization (FISH)- a set of chromosomes-specific probes from one species are made using unique combos of fluorescent dyes- these probes paint the chromosomes in a mitotic chromosome spread from cells of the second species
chromosome walking
- this procedure clones a large contiguous portion of a chromosome- a probe at end of one clone is used to identify overlapping genomic clones in a library- procedure is repeated until region of interest is covered
clusters of orthologous genes ( COGs)
- sets of genes from a collection of species- they are hypothesized to encode the same gene product- determined by pairwise best-match sequence similarity
complementation group
- a set of alleles that fail to complement ( substitute for the function of) on another- this usually indicates that they are mutations in the same locus
consensus sequence
- a hypothetical sequence that has the most common amino acid at each position in a multiple alignment of DNA or protein sequences- aka amino acid or DNA sequence
copy number variation (CNV)
- polymorphism in the number of copies of a stretch of DNA- this includes deletions and duplications of whole genes
contig
- a contiguous stretch of cloned DNA- may refer to:1) a scaffold of overlapping clones (physically mapped)2) a long stretch of DNA sequence assembled by merging two or more sequences
cosmid
- large insert plasmids- usually exist as a single copy within host bacterial cells- contain cos sites that allow in vitro packaging of inserts as phase molecules if desired
CpG islands
- stretches of vertebrate DNA- usually 1-2kb long- contain a 10x higher frequency of doublet nucleotides CG than entire genome- usually found near the 5’ end of genes
Cre-Lox recombination system
- a combo of site specific recombinase (Cre) and its recognition site (lox) from the bacteriophage P1- engineered into yeast, mouse, and other eukaryotic genomes to facilitate targeted recombination
C-Value paradox
- no apparent correlation between the number of genes and the amount of DNA in a genome- there is a range of DNA content even in closely related organisms- no relationship between complexity and DNA content
cytological map
- map of the location of genes or other DNA features relative to the banding patterns of the chromosomes of a species
data normalization
- process of removing systematic biases from microarray data- these biases cause misinterpretation of apparent differences in transcript abundance
deficiency complementation mapping
- this method is for fine scale mapping of QTL based on the variable ability of WT alleles to complement the effects of hemizygotes for a deletion of a gene or genes- how a WT reacts to hemizygotes when gene is missing
dideoxynucleotide
- a nucleotide without OH at both 2’ and 3’ carbon of the sugar backbone- cannot covalently link to the next nucleotide in a growing DNA- used in chain termination sequencing
DNA binding motif
- short stretch of DNA (8-12 nt) that can be recognized by a DNA binding protein- motifs can be represented by a profile of frequency of each dNTP- these motifs help identify sequences important for gene regulation
DNA library
- collection of clones where each piece contains a different segment of genomic of cDNA - if clone can be transcribed and translated, it’s an expression library
ectopic expression
- activation of expression of a gene in a cell that is not usually expressed (abnormal expression)- can be done artificially or by disease
embryonic stem (ES) cells
- this cell line an be transformed and manipulated in culture-then injected into the blastula (early embryo) where it integrates with and grows to contribute to the development of the adult animal- injected embryos are chimeric- if ES cells populate in the germ line, a transgenic organism is produced in the next population
enhancer
- orientation and distance independent regulatory sequence- increase transcription levels and can occur anywhere in a genome- can act over 100kbps - one enhancer can affect the transcription of several genes
enhancer trap
- this transposable element is modified with a reporter gene - when inserted into the genome adjacent to a gene, the enhancer that drives expression of that gene also drives expression of a reporter gene
Ensembl gene browser
- storage and resource for genomic data in Europe- run by Sanger Centre (Cambridge) and European Bioinformatics institute (within the European Molecular Biology Lab)
epistasis
- two definitions- (Quantitative genetics) an interaction between two or more loci that results in non additive effects of one allele as a function of the genotype at the other locus- (developmental/physiological genetics) describing a mutation whose phenotype is unaffected by another mutation
epitope
- portion of a protein, carb, or other molecule that is specifically recognized by an antibody
E-value
- expected number of sequences in a database that would by chance produce an equivalent or better alignment score than the one under consideration
expressed sequence tag (EST)
- sequenced piece of cDNA (subsequence)- full-length cDNA defines structure of transcript, but EST is a tag that indicates that the particular sequence is part of a transcribed gene- (online definition) EST is a tiny portion of an entire gene that can be used to help identify unknown genes and to map their positions within a genome
expression library
- a library of cDNA clones in a vector that allows the gene products to be expressed (transcribed and translated) in a controlled manner
expression vector
- a cloning vector that allows transcription and translation of a cDNA fragment that is inserted into the multiple cloning site
expressivity
- the severity of a disease OR- the degree to which a trait is observed in affected individuals- often affected by the environment
F3 design
- a genetic screen designed to isolate recessive mutations- requires that the phenotype be measured in F3 progeny of the mutagenized individual
floxing
- a method for inducing a mutation at a precise time and place in an organism- when a mouse has loxP binding sites on both sides of an exon of the gene to be mutated (placed by homologous recombination)- this is crossed to a strain containing Cre recombinase in the tissue of interest- exon is excised only in that tissue
fold recognition
- method for predicting the tertiary structure of a protein- secondary structure is predicted by using limited sequence similarity and comparing it to find the previously described domain fold that most closely fits the unknown protein structure
forward genetics
- genetic analysis that starts with the phenotype and moves towards isolation of gene that causes the phenotype- phenotype => gene
functional genomics
- study of the function of each and every gene- (ie) biochemical activity, cell biology function, organismal function- this includes genetic analysis, microarrays, proteomics, and computational biology
fusion protein
- hybrid protein made by fusion of two genes in an expression vector-N terminal = tag (poly histidine, small glutathione s transferase)C terminal = protein of interest
GAL4
- a potent transcription factor from yeast that enhances gene expression only though a UAS sequence adjacent to the promoter- if no UAS sequence, GAL4 will have no effect on transcription in heterologous genomes- GAL4-UAS system is specifically used to drive expression of transgenes introduced into that genome
gene knock-in
- replacement of the endogenous gene with a different functional piece of DNA- inserted gene is expressed in place of the original gene- germline gene therapy uses gene knock-in to replace a defective gene with an active copy- the replacements are performed using positive-negative double selection strategy in ES cells
gene knock outs
- a mutation that targets a specific gene, made by using homologous recombination to replace exon of the target gene with a piece of foreign DNA (lacZ reporter gene)- insertional mutations can also cause gene knock outs
genetic fingerprinting
- strategy for testing subtle effects of mutations on the fitness of microbial strains in competition with other strains during long-term culture
genetic heterogeneity
- the observation that the same disease or phenotype can have multiple different genetic causes- allelic heterogeneity = if different variants are within a single locus
genetic map
- in cM (centiMorgans)- map of the order of and distance between genes based on recombination frequency between markers- markers can be physical (molecular variants) or visible (mendelian loci)- mapping populations may be pedigrees, crosses between lines, or radiation hybrid cell panels
genome-wide association study (GWAS)
- a study designed to scan the entire genome for SNP and CNV that are associated with a disease or trait- at least 500k different genetic variants are measured in several 1000 disease cases and a similar number of healthy controls
germ line
- the population of cells in eukaryotes that are destined to undergo meiosis to become oocytes or sperm- the germ line is set aside very early in animal development - in plants, the germ line is specified at the time of flowering
haplotype
- multi-site genotype of two or more polymorphisms on the same chromosome- (ie) individuals who are homozygous at one site for G allele and heterozygous at a nearby site for A and T, the individual would have GA and GT haplotypes
Hardy-Weinburg equilibrium
- expectation that genotype frequencies in a population will tend to be stable and predictable as a simple function of individual allele frequencies- equilibrium is broken by evolutionary forces (such as migration, inbreeding, mutation or selection)- these forces can lead to increase or decrease in the number of heterozygotes
heavy isotope labeling
- method for quantifying protein expression between two samples- one protein is labeled with heavy isotope (deuterium) so peptide piece moves slower through TOF spectrometer than the corresponding unlabeled fragment- ICAT reagents are used for uniform labeling of protein mixes after cell extraction
heteroduplex DNA
- dsDNA containing a polymorphism- formed by renaturing PCR products from two different alleles
heuristic search
- algorithms that use time-saving methods to search for the most likely solution- reduce search space by excluding unlikely solution from the analysis- not guaranteed to find the optimal solution, but it’s often the only way to perform phylogenetic analysis or sequence alignment involving a large number of sequences