quiz 4 information Flashcards
Genome
- organisms complete set of DNA, including chromosomal and mitochondrial
Genomics
- study of the genomes of organisms,
- determining entire dna sequence of organisms/ genetic mapping
DNA sequencing
- ## determining precise order of nucleotides within a DNA molecule
Methods of DNA sequencing
- sanger chain-termination
- high-throughput methods (next generation)
high-throughput methods
- next generation sequencing
- main difference between this and sanger sequencing is volume
- massively parallel, sequencing millions of fragments simultaneously per run
- real time sequencing, does not require lengthy electrophoresis
dNTP vs ddNTP
- dNTP = deoxygenated at 2’ and have hydroxyl at 3’ carbon
-ddNTP = have hydrogen instead of hydroxyl at 2’ and 3’ carbons…. prevents from forming phosphodiester at 3’
What is sanger-chain termination
- in vitro DNA replication reactions
- includes DNA to be sequenced, a DNA primer, a DNA polymerase, normal dNTPs (high concentration), and modified ddNTPs (smaller amount)… many identical DNA fragments are used in each reaction, generated by cloning
- produces a large number of partial replication products, each terminated by incorporation of a ddNTP at a different site in the sequence
what are the four standard nucleotides added to danger sequencing reactions?
What are the four options that could be added of the modified?
- normal: dATP, dGTP, dCTP, dTTP
- modified: ddATP, ddGTP, ddCTP, ddTTP
DNA sequence reactions with ddNTP
- incorporation of dCTP allows chain to continue growing
- incorporation of ddCTP terminates chain elongation
- partial replication products terminate each cytosine of the chain due to the incorporation of ddCTP… different DNA fragment lengths are generated by ddCTP incorporation into C reaction micture products
what to do following sanger sequence reaction
- contents of each reaction are electrophoresed, separated by length
- can identify consecutive nucleotides by gel lane in which successively longer DNA fragments are locatedf
- can then determine complementary strand
bioinformatics
- application of computational methods to the storage and analysis of biological data
- study of large sets of biodata
computational biology
- emphasizes development of theoretical methods, computational simulations and mathematical modeling
- use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships
Big Data
- large data sets that cannot be processed by traditional approaches
- data mining
data mining
- analysis of the large data sets for useful information
Primer walking
- potentially useful for a large DNA fragment, but not efficient for genome projects
- primer allows ends of clone to be sequenced from both sides
- new primers are designed based on newly obtained sequence
- procedure is reiterated until sequence from both ends overlap
shotgun sequencing
- genomes must be broken into small fragments and the pieces sequences in parallel…. shotgun seuqnecing everything… doing it all at the same time then piecing it together at the end
- clone by clone sequencing
- whole genome shotgun sequencing
clone by clone sequencing
- chromosomes broken into overlapping fragments that are arranged in linear order to produce a map
whole genome shotgun sequencing
- DNA of entire genome is fragmented and pieces chosen at random and sequenced
Human genome project outcomes
- number of protein encoding genes did not correlate with complexity as expected
- prevelance of the use of other mechanisms that increase complexity and variation without increasing number of coding genes
- recognition of complex regulatory network
- junk dna serve a purpose
metagenomics
- study of genetic material recovered directly from environmental samples
- with sequencing and computational techniques, one can go into environment, pick up sample, and put in test tube.. isolate DNA and sequence… and determine whats there…
- discovered many species that had never been cultured
Sargasso Sea metagenomics project
- environmental genomic shotgun sequence was performed on DNA isolated from microorganisms found in sargasso sea
found: - 1800 different genomes
- one million new protein coding genes
Microbiomes
- entire habitat including microorganisms, their genomes, and the surrounding environmental conditions
microbiota
microorganisms in particular environment
metagenome in terms of microbiome
- combined genomes and genes of the microorganisms in a particular environment
We have a genome sequence… now what to do with it
Annotate to describe genes…
annotation
- attaching biological functions to DNA sequences, based on experimental evidence or computational analysis
genome annotation
- identification of important components in genomic DNA: location of genes and functional sequences
gene annotation
- defines biochemical, cellular and biological function of each product
experimental approaches to genome anotation
- compare cDNA to genomic sequence to identify sequences of a genome that undergo transcription leading to production of mRNA molecules.
- large amounts of cDNA are available, allowing for partial/complete assembly of gene transcripts
- comparing these allows accurate annotation of gene exons/introns
- expressed sequence tags (ESTs): mRNA fragment sequences are derived through single sequencing reactions performed on randomly selected clones from cDNA libraries
computational approaches to genome annotation
- gene (reading frame) identification
- non coding genes (for ncRNA)
Gene families
- groups of genes that are functionally and/or evolutionarily related. members of a family will have high sequence similarity
protein families
- groups of proteins that are evolutionary related, have related functions and similarities in their sequence and structure
domains
- many proteins are modular… distinct domans that are joined together… particular protein domain may be found in numerous genes
- some genes may be very similar being they share functional domains
domains
- regions of a protein that have a specific function and can usually function independently of the rest of the protein
motif
- similar 3d structure conserved among different proteins that serve a similar function
ex: helix-turn-helix motif
what is the different between domains and motif?
motif:
- arrangement of secondary structures of protein molecule
- not stable
- no functional role
domain:
- 3D dimensional fundamental and functional unit of protein
- stable by itself
- functional unit of the protein
examples of gene families
- receptor tyrosine kinases
- MAP kinase
- G-proteins
- SOX gene family (TFs)
- immunoglobulin superfamily
- ABC transporters
- ion channels
examples of protein domains
- SH2
- immunoglobulin
- fibronectin type 3
- kringle
- Ca2+ binding
- protease domain
- UBA (ubiquitin- associated domain)
genome annotation of human chromosome 21
-
Why do bacteria have fewer genes but higher gene density (protein encoding genes) than eukaryotes?
- lack of introns
- compact gene regulatory sequences
- lower complexity of protein structure
does gene number determine organisms complexity?
no, complexity is in the network
evolutionary genomics
- comparative study of genomes…
- provides clarity to the three of life (phylogenetic tree)
- genes encoding rRNAs provide universal sequence for comparision
Carl Woese
- categorized three domains: eukaryote, archaea and bacteria
- originator of RNA world hypothesis
- work on horizontal gene transfer as a primary evolutionary process
transcriptomics
- study of gene expression (transcription) from a genomic perspective
transcriptome
- set of transcripts present in a cell or organism
(identity and quantity of RNA in a biological sample at a given moment)
What methods are used to determine transcriptome?
- 1: DNA microarrays: old way…
… widely used in data mining - 2: high throughput sequencing (RNA-Seq)
- RNA-seq is dominant method
- data mining
microarrays
- array of spots on a microscope slide, oligonucleotides are fixed to the spots, spots represent different genes… enough spots on slide to represent all genes in entire genome
- isolate RNA from sample, label RNA with a fluorescent label, mix the two together…
- bathing slide with cDNA, cDNA anneal to complementary spot, binds quantitatively
- more cDNA binding = higher quantity of transcript
- measure transcript levels by brightness of spots
method for interpreting microarray
- raw data is converted to expression in one cell type relative to another
- heat map
RNA-seq
- uses next generation methods
- simultaneously sequence thousands of cDNAs in sa sample at one time
- computer take all sequences and assign them to one gene or another
- faster, cheaper
Use of data mining
- data is sent to a data base and used for other investigators to harvest the information
- there are different databases and algorithms, wont always give you the same information.. can compare the information from each database
proteomics:
- study of all the proteins expressed in a cell
- looking for proteins involved in a specific biological activity
techniques used for proteomics:
- 2D gel electrophoresis
- mass spectrometry
- affinity chromatography
- two hybrid system
2D gel electrophoresis
- electrophores samples in 2 dimensions
- 2 parameters:
size (SDS page) , and charge (by isoelectric focusing)
apple pH gradient, migrate proteins and they find their charge and stay there… lay across SDS PAGE gel
affinity chromatography
- co-IP on a column to identify putative interacting partners
- to isolate interacting partners and mass spectrometry to identify purified proteins
interactome
- whole set of molecular interactions in a particular cell
- wiring diagram
g
g
Omics refers to…
collective techniques used to explore actions of various types of molecules that make up the cells of an organism