Chapter 1 Long questions Flashcards
Contrast between the challenges of gene identification in prokaryotes vs eukaryotes
- Gene identification is easier in prokaryotes than in eukaryotes
-Prokaryotes have smaller genomes, fewer genes, contiguous genes that lack introns and small intergenic regions - Eukaryotes have sparsley distributed genes with most having introns and alternative splicing also complicates gene identification
Distinguish between two general methods of gene identification
- Priori method: recognize sequence patterns within expressed genes and the regions flanking them
- Been there seen that method: recognize regions corresponding to previously known genes
Describe useful features of gene identification in addition to codon usage, what to look for in the beginning, middle and end of genes
- Beginning of a gene (5’ end):
> 5’ exons start with a transcription start site, preceded by a core promotor (TATA box), they are free of in-frame stop codons and they end immediately before a GT splice signal - Middle of a gene
>Internal exons begin immediately after an AG splice signal, they are free of in-frame stop codons and they end immediately before a GT splice signal - End of a gene 3’ exon
> Starts immediatly after a AG splice signal and ends with a stop codon (TAA or TGA), followed by a polyadenylation (Poly-A) signal sequence
Reasons for sequencing non-human genomes
-Reveal and illuminate the processes of evolution
- Help us understand the functions of different regions in the human genome
- Help us understand the genomes of pathogens that exhibit antibiotic resistance for better human welfare
- Improving plants and animals
- conservation of endangered species
Clinical applications in humans, genetic testing for diseases, geneology, law enforcement, mutation discovery
The relationship between SNPs and haplotypes, How does recombination affect this relationship, what factors affect the incidence of recombination
- SNPs, sine nucleotiode polymorphism, a mutation at a single base position in a genetic sequence
- Haplotypes are local combinations of genetic polymorphisms that tend to be co-inherited as a block
- Some SNPs are co-inherited as blocks
>Discrete combinations of SNPs in recombination-poor regions define a haplotype - Mutations on different chromosomes will be separated by independent assortment within one generation
- Mutations on the dame chromosome will be separated by recombination
-Recombination is affected by distance and vary in the genome, hot and cold hot-spots for recombination
International HapMap project - describe and elaborate each main finding
- Most variations appear in all populations sampled
>Some of the inter-population differences reflect different relative amounts of the same SNPs - Very few SNPs are unique to specific populations
>11 were constantly different between all individuals of European origin and all Chinese and Japanese origin
-Genomes of individuals from Japan and China are very similar
>They have a more recent common ancestor - The X-chromosome varied the most between different populations than others
> X-chromosomes recombine in females only ( Faster-X-Effect)
-The length of haplotype blocks varied among different sources of samples
>They tend to be shorter in African populations, the older the population the greater the chance of recombination. (Out of Africa Theory)
Implications of little vs high genetic variation in populations
Population - an interacting and interbreeding group of individuals of the same species inhabiting the same geographical area
- A population that has passed through a bottleneck or that has developed in isolation from a small “founder” group will show very little variation
- A population with relatively high variation is likely to have a longer evolutionary history (Out of Africa theory)
Role of genomics in delineating species
- Genomics raised opportunities and challenges to the idea of species
-Pro: DNA sequences of members of different species differ more than the variation among individuals of the same species
-Con: Extent of sequence diversity to distinguish species is quite variable across taxa, genomic distance also doesn’t help - e.g in microbiology bacterial species are those that maintain > 97% sequence identity in 16s rRNA
Causes of extinction
- Excessive hunting
- Pathogen extermination
- Habitat destruction
- Natural events, i.e Elm yellow, white noise disease, transmissible cancer
- Climate change
- Species translocations
- Environmental pollution
- Indirect effects due to species interdependence.
The mechanisms of CRISPR/Cas
Clustered regularly inter-spaced short palindromic repeats.
- Prokaryotic genome regions used as defense against viral infection/ similar to vertebrae immune systems in that they are responsive and heritable
- When a phage infect the bacterium cell, regions of the phage DNA are clipped out, replicated and integrated into a new CRISPR locus with a spacer in between
- Transcription of that region produces CRISPR RNAa that bind to Cas proteins
- Using the bound RNAs as a probe, the Cas proteins will clip viral DNA when a match against a region of DNA from an invading virus is detected, thus defense has been effected
Potential application of CRISPR/Cas
- Gene knockout and editing
> Gene knockout - silencing gene expression
> Gene editing - replace/insert DNA sequences into endogenous gene - Gene drive
> Using CRISPR/Cas to accelerate the dispersal of a chosen gene throughout a population
> A possible weapon against a vector that carries the Zika virus.
Distinguish between Highthroughput sequencing projects
-De novo sequencing - Determining the full-genome sequence without using a known reference sequence from an individual of the same species
- Resequencing - Determining the sequence of an individual of a species for which a reference genome sequence is already know
- Exome sequencing - resequencing project that sequences only exons/coding regions, identify trait-linked mutations in protein-coding region.
Describe the contents of the human genome and functional roles (protein coding regions and other main regions)
- Protein coding regions - Areas in the genome that produce different kinds of proteins, these proteins will perform various cellular activities either as enzymes or being structural components in different entities.
- Non protein coding regions - Areas in the genome that produce different kinds of RNA molecules (e.g tRNA,siRNA, miRNA, rRNA), RNAs will then perform various structural and regulatory roles, most cintrol gene expression e.g miRNA and siRNA
- Pseudogenes - Degenerate genes that have acquires many deleterious mutations over a period of time, these genes encode non-functional proteins
>Processes pseudogenes - picked up by virus from mRNA and reverse transcribed and they tent to lack introns and promoters. - Binding sites for ligands - Regions (i.s promoters) that are targeted by various DNA interacting proteins such as enzymes or regulatory proteins, ligands will tend tend to perform regulatory roles as in the case of transcription factors activating/repressing gene expression.
- Repetitive elements - Repeated DNA sequences that tend to constitute the largest component of any genome, some are functional such as rRNA genes, while the majority are non-functional such as minisatellites or microsatellites (15%), Lines (21%), SINES (13%)
Characteristics of protein coding genes and specific examples
- There are 23000 protein-coding genes in the human genome
- They occupy a small fraction of the genome 2-3% of the overall sequence
- They are distributed unevenly across chromosomes and appear on both strands
-Many appear in multiple copies, either identical or diverged into families due to duplication or divergence (e.g over 900 related olfactory-receptor genes). Closely related genes may co-localize (e.g hemoglobin genes) and some genes may occur on different chromosomes (e.g ubiquitin)
-Some chromosomes are gene-rich (e.g chromosome 19 and 22) and some are gene-poor (e.g subtelomeres, chromosome 18 and X)
-Protein-coding genes - in humans typically contain coding (exons) and non-coding (introns) regions. Splice-signal sites delineate borders between the coding and non-coding regions. On average exons are roughly 200bp in length and introns vary in sizes leading to diversity in protein coding lengths
Characteristics of protein-coding genes in the human genome: Gene structure
- Protein-coding genes in the human genome typically contain coding (exons) and non-coding (introns) regions
-Splice signals sites delineate borders between the coding exon and non-coding intron regions
Exons are roughly 200bp in length, introns vary in sizes leading to diversity in protein-coding gene lengths.