Chapter 1- introduction to genomics Flashcards
How many base pairs does the human genome consist of
3,2 x 10 9 base pairs distributed amoung 22 pairs of autosomes and 1 pair of sex chromosomes
define a genome
all the genetic material of an organism. In humans there is nucleic and mitochondrial and in plants there is nucleic, mitochondrial and chloroplast
Define genomics
Study of genomes
How many protein coding genes are there in the human genome
about 23000
What are the static and dynamic components of the genome
The static component is the one that stays constant, it is the microbiome(the bacteria within and on our bodies) and the dynamic component is the regulatory interactions which integrates the activities of individual components
What is the phenotype equation
Phenotype= genotype+ environment+life history+ epigenetics(genotypexlife experiences)
Define phenotype
collection of observable traits- macroscopic, eg. hair, height and microscopic eg. whether the individual has a possibility of having sickle cell anemia
Define genotype
genetic constitution, your DNA sequence(both nucleic and mitochondrial)
What is pharmacogenomics
The personalized prevention and treatment of diseases based on DNA sequences
Define life history
The integrated total of your experiences and the physical and psychological environment in which you grew up. Physical development depends on your nutritional history and your mental development depends on your educational history and nurture you were given. As well as your in utero environment
Define environment
Anything other than the gene that is being looked at, including other genes
Epigenetic factors
Genotype x life experiences- Your parents might have altered the epigenetic patterns in their cells and caused “pre-differentiation” signals which might have passed on to you. Another example is that all cells, except for eggs, sperm and immune cells, have the same DNA sequence. But, different sets of genes are expressed and silenced in different parts of the body
Does the genome determine the features of an organism
No, the genome constraints but does not determine the features of an organism
What leads organisms to explore different states within their genome, provide an example
Different surroundings and experiences, eg. the lac operon in E-coli
Give an example of how environmental effects have long-term effects on development
Exercise contributes towards your physical development, a diet rich in phenylalanine can prevent the development of phenylketonuria. Also malnutrition, disease, and injury affects your development negatively
How have researchers measured the importance of the different factors that contribute to a phenotype
Controlled experiments with genetically identical organisms, in humans monozygotic twins. Scientists compare individuals with the same genes but have been exposed to different environments. Other less controlled studies include, comparisons between identical and fraternal twins or non-twin siblings with adopted children in the same family. In plants they used clones.
What new piece of information regarding monozygotic twins changes the method used to measured the importance of the different factors that contribute to a phenotype
Their DNA was found to be different, due to mutations and this caused variation at the DNA level
List the differences in genome organization in a prokaryotic cell vs. a eukaryotic cell for the following features: size, subcellular division, state of genetic material, cell division method, internal differentiation that contains DNA
Prokaryotic cell: 10 um, no nucleus, circular loop with a few proteins attached (nucleiod), fission, none.
Eukaryotic cell: 0.1mm, nucleus, chromosomes, mitosis or meiosis, nuclei, mitochondria, ER. golgi complex, cytoskeleton
Name and describe the functional regions of genes
1.) Protein-coding regions: 3 bases code for 1 amino acid, any nucleic sequence can be translated into an amino acid sequence in 6 ways (TGA, from T or G or A= forwards and backwards). A protein-coding region will contain open reading frames in one of six reading phases. An ORF is a potential protein-coding region of about 100 bp that begins with a start codon(AUG) and ends with a stop codon( TAA, TGA, or TAG).
In prokaryotes, a gene is a contiguous region of DNA. In eukaryotes, the coding regions of genes(exons) may be interrupted by non-coding regions called introns, which must be removed to form mature mRNA that specifies the amino acid sequence of the protein.
2.) some regions are expressed as non-protein coding RNA: they will show regions of local self-complementary corresponding to hairpin loops, eg. the genes for tRNA contains the signature cloverleaf pattern
3.) other regions are targets of regulatory interactions (motifs eg. TATA)
Is gene identification easier in prokaryotes or eukaryotes and why
Its easier in prokaryotes because prokaryotic genes are smaller and contain fewer genes, genes in bacteria are contiguous- they lack introns, the intergene spaces are smaller(90% of genes in E-coli are protein-coding), ribosome binding sites are conserved. Also, protein-coding genes in eukaryotes are sparsely distributed and interupted by introns. Assembling exons are a problem
What are the 2 basic approaches to identifying genes in genomes
- ) a priori methods- seek to recognize sequence patterns within expressed genes and the genes on either side of them. Protein-coding regions have distinctive patterns of codon statistics, including the absence of stop codons
- ) ‘been there, seen that’ methods- recognize regions corresponding to previously known genes, from the similarity of their translated amino acid sequences to known proteins in other species or by matching expressed sequence tags
List the characteristics that are useful in identifying eukaryotic genes
> The initial(5’) exon starts with a transcription start point, followed by a core promoter site(about 30bp upstream). It is free of in-frame stop codons and ends immediately before a GT spice signal (sometimes a non-coding exon precedes the exon that contains the initiator codon)
Internal exons are free from in-frame stop codons. They begin immediately after an AG splice signal and end before a GT splice signal
The final(3’) exon starts immediately after an AG splice signal and ends with a stop codon (sometimes a non-coding exon follows a follows the exon that contains the stop codon)
All coding regions have non-random sequence characteristics, based partly on codon usage preferences. Hexanucleotides are best when distinguishing coding from non-coding regions. Also, using a set of known genes from an organism as a reference, pattern recognition programs for gene recognition can be tuned to particular genomes
Are the static and dynamic aspects of the human genome similar in general features to other species genomes
Yes
Which 2 directions is the field of genomics progressing as genome sequencing techniques become easier
- ) to determine more and more human genome sequences, especially those that are useful for research of diseases
- ) many different species have had their genomes sequenced for at least 1 individual
Why are the sequences of non-human genomes available publicly, but human genome sequences are not
It is a matter of privacy
Why do we sequence non-human genomes
They reveal information about the process of evolution and they help us to understand the function of different regions in the human genome
What is the important principle behind evolution and how does this apply to sequencing non-human genomes
If evolution conserves something it is essential and if it does not conserve something it is not essential. When looking at the human genome, about 98% is not protein-coding or non-coding RNA; therefore in order to understand its functions we need to compare it to other mammalian genomes. If it is conserved there must be a reason
Give an example of how genomes of other species have a direct application to human welfare
The genomes of pathogens that have developed antibiotic resistance can give us clues as to what we can do to combat this and keep ahead of them
List some of the other practical applications for sequencing non-human genomes
Improving agriculture and domesticated animals, biotechnological applications, the conservation of endangered species
What is metagenomic data
These are sequences determined from environmental samples, without isolating individual organisms, eg. water, soil
What are some of the clinical applications of sequencing human genomes
Genetic testing: prospective parents who want to find out if their carriers for cystic fibrosis; many woman test for potentially dangerous mutations in genes BRCA1, BRCA2 and PALB2, which can predispose an individual to cancer- usually carried out when family history suggests an increased risk.
Law enforcement agencies determine DNA sequences using samples from crime scenes
Does a genome belong to a species or a single organism
A single organism
Why is it essential to relate genomes to one another
To see life holistically and clearly
How do we study the genomes within species
We study genetic variation within and among populations
Give some examples of the applications of sequence variations in humans
Medicine, anthropology, to trace migration patterns, in genealogy, in personal identification to prove that someone is a parent or in a crime investigation
Give some examples of the applications of sequence variations in other species
Understanding the history of these species including the domestication of animals and crop plants and guiding efforts to enhance desirable traits
How does any two people differ in terms of their genome
Any two people (except identical twins) have genomic sequences that differ at 0.1% of the positions (mutations)
What does comparing the genomes of human allow us to learn about this 0.1% variation
It allows us to distinguish between random components of this variation and the variation that used to characterize different populations
Can mutations be consistent with a healthy life and one with a long lifespan
Yes
What do these mutations that are consistent with a healthy life contribute to
Our ethnic group and individual variety
What does the fact that mice that lack myoglobin thrive and that are more athletic than normal mice tell us about this mutation
That even though they suffer from a loss of a protein it is benefitting them
Provide an example of when a species-wide loss of an ability was not considered a disease
When there was a loss of biosynthetic enzymes, it contributes to a list of essential nutrients eg. most animals can synthesize vitamin C, but we need it in our diet otherwise we get scurvy
Are most mutations synonymous or non-synonymous
Most of them are non-synonymous and deleterious, impairing protein function and threatening disease
Define a mutation and provide examples
A mutation is a change in the DNA sequence eg. deletion, substitution, insertions and translocations
Define single-nucleotide polymorphism (SNPs)
It is an individual isolated base substitutions
Other than SNPs, what are other mutations that cause other variations within species
Short deletions and variations in number of replicates of repetitive sequences, including variations in copy number of genes
Variations in human genomes are the subject of several large-scale projects. How have databases contributed to this and how are they growing
Databases now contain over 100000 mutations in 3700 genes. This is 6.2% of the total 23000 genes. This number is growing rapidly and about 10000 new mutations are discovered each year
How are SNPs inherited
Everyone has an accumulated collection of SNPs that has been inherited from their ancestors, some are inherited as blocks, other are not
What happens to mutations in different DNA molecules of diploid chromosomes vs. mutations on the same chromosome in terms of how they are separated
Mutations that occur in different DNA molecules in diploid chromosomes become separated during independent assortment in 1 generation and mutations that occur on the same chromosomes become separated more slowly during recombination.
When do mutations in sequences remain together
Haploid sequences, such as most of the Y chromosome or mitochondrial DNA do not go through recombination
By what mechanisms can mutations affect human health
> Some mutations destroy the activity of a protein, eg. haemophilia (blood does not clot properly) arise from a mutation in the blood coagulation cascade either factor VIII or IX. Another example is phenylketonuria (build up of phenylalanine in the body) arises from a mutation in phenylalanine hydroxylase, the enzyme that converts phenylalanine to tyrosine
Some mutations cause disease only in combination with unusual features of the environment or specific triggering events, eg. the Z-mutation of alpha1-antitrypsin (Gly342 to Lys) creates a predisposition to certain diseases due to lung damage and smoking aggravates the mutation
Loss-of-function mutations are usually recessive, so the homozygosity for the mutation has more severe consequences than heterozygosity
Many diseases are associated with the formation of insoluble aggregates, usually of misfolded proteins e.g. classical amyloidoses (build up of amyloids in organs), Alzheimers and huntingtons disease, aggregates of misfolded alpha1-antitrypsin and prion diseases (A prion is a type of protein that can trigger normal proteins in the brain to fold abnormally- a family of progressive neurodegenerative diseases)
How does the polymerization of insulin affect diabetes therapy
It creates problems in production, storage and delivery in diabetes therapy
How do mutations that destabilize proteins affect the proportion of misfolded proteins
It can increase the proportion of misfolded proteins and this causes them to show a greater tendency to form aggregates
Give an example of when a mutation produced a defective protein that resulted in a beneficial trait, so it remained in populations
The genes for sickle-cell disease and for glucose-6-phosphate dehydrogenase deficiency provide resistance to malaria
Can the dysfunction of a regulatory protein or receptor cause the operation of a pathway to disorganize even if all the components of the pathway are normal
Yes
What are some of the effects of when an abnormal regulatory protein cannot be activated or when it is constantly activated and cannot be shut off
These mutations can produce:
> physiological defects: a number of diseases are associated mutations in G protein-coupled receptors. Some mutations in opsins are associated with colour blindness. Certain mutations in the G protein target the olfactory receptors lead to loss of sense of smell
> developmental defects: several types are traceable to hormone receptors, eg. Laron syndrome- a mutation in the human growth hormone receptor that results in a diminished stature. If the person receives growth hormone externally it will not help them grow
What is the difference between the mutation in a healthy cell and a cancer cell
Healthy cells accumulate mutations at a moderate rate, while cancer cells accumulate mutations at an increased rate because they have lost checks on accuracy of DNA replication
What is the best way to distinguish variations arising from the cancer cells itself from healthy cells (in research)
Compare sequences from tumour cells with those from normal cells from the same individual, rather than using a single reference genome
Why do researchers sequence cancer genomes
To understand the origins of cancer, precise diagnosis, prognosis and therapy
Define a haplotype
A group of alleles in an organism that are inherited together. It can refer to a group of alleles or to a set of SNPs found on the same chromosome
When recombination occurs between mutations in the same chromosome how does the distance of the loci affect the frequency
The greater the distance the greater the frequency of recombination, but recombination rates vary widely along the genome
During recombination what is likely to happen to SNPs located in recombination hot spots vs. cold spots
They are more likely to separate in hot pots and stay together in cold spots
In humans about how long is the average region that stays in tact
Regions of about 100kb is length
How many SNPs per region that stays in tact
An average density of 0.1% or 100 SNPs per 100kb. Very few occur .
How do you calculate the number of possible combinations of the SNPs
There are a huge number of possible combinations(2 to the power 100), but many 100kb regions show about 5 SNPs. Therefore 2 to the power 5= 32 possible combinations of SNPs
What is the significance of haplotype in research
It simplifies the search for genes responsible for diseases, or any genotype-phenotype correlations. The target might be one base out of 3.2x 10 to the 9- when correlating the phenotype with haplotype you look at 100kb that contain a few genes
What is the objective of the International HapMap Project
It collects and curates haplotype distributions from several human populations. SNPs are its raw material from which they identify correlations among them
Describe the phases of the project
Phase I- The goal was to measure the distributions of at least one SNP every 5kb across the entire human genome. Blood samples were provided by 269 individuals from 4 continents. Over 1 million SNPs of significant frequency (>5%) were documented. In addition 10 selected 500kb regions were fully sequenced from 48 samples.
Phase II and III- Extended the analysis of the samples to determine an additional 4.6 million SNPs from the same individuals
What were the results of the International HapMap Project
> Most of the variations appeared in all populations that were sampled. Some of the inter-populations differences reflect different relative amounts of the same SNPs
A very few SNPs are unique to a certain population, e.g. out of > 1 million SNPs, only 11 are consistently different (in the sample studied) between all individuals of European origin and all individuals of Chinese or Japanese origin
The genomes of individuals from Japan and China are very similar, which suggests more recent common ancestry than other population pairs in the study
The X chromosome varies more between populations than any other chromosomes. This might be because males only have one X chromosome, the genes are more subject to selective pressure. Recombinations of X chromosomes can occur in females only.
Lengths of haplotype blocks vary among the different sources of samples. They tend to be shorter among populations from Africa- consistent with the idea of an African origin of the human species. The idea is that the older the populations (the larger number of generations) the greater the chance of recombination.
What is the MHC
The Major Histocompatibility Complex(in humans it is known as the human leucocyte antigen[HLA]system are proteins in the human genome that are encoded in a 4 Mb region on chromosome 6.
How are MHC proteins expressed in a species of vertebrates
Each individual expresses a set of MHC proteins
Do these proteins have sequence variability, explain
Yes, this system is highly polymorphic- with 50-150 alleles per locus. It has a higher sequence variability than most polymorphic proteins
How does the individual variation of the MHC region compare with other haplotype blocks
The set of MHC proteins expressed defines a partial haplotype of an individual and compared with other haplotype blocks the MHC region shows wide individual variation
The MHC region contains over 120 genes coding for proteins, list the functions of these proteins
> provides the mechanism by which the immune system distinguishes ‘self’ molecules from foreign invaders that must be removed
determines individual profiles of competence for resistance to diseases
are useful markers for determining relationships among populations of humans and animals. Also, for tracing large-scale migrations and population interactions
Explain how MHC haplotypes control donor-recipient compatibility in transplants
Surgical patients(if not immunosuppressed by drugs) will reject transplanted organs(unless the donor is an identical sibling) because the transplant is recognized as foreign. MHC proteins bond peptides and present them on cell surfaces. The triggering event in alerting the immune system to the presence of a foreign protein is the recognition by a T-cell(anti-body) receptor of a complex between an MHC protein and a peptide derived from denaturation and cleavage of the foreign protein.
Provide another activity that MHC-peptide complexes provide in humans
They are involved in the removal of self-complementary T cells in the thymus during development, at the stage when the distinction between self and non-self is ‘learnt’
How does MHC haplotype influence autoimmune diseases
It influences the breakdowns in self-/non-self distinguishability that result in a person’s immune system attacking his/her own tissues, eg. of autoimmune diseases include rheumatoid, arthritis, multiple sclerosis, type I diabetes and systemic lupus erythematosus
Explain how MHC haplotypes determine patterns of disease resistance
Different MHC molecules have different binding specificities and can present different sets of peptides. People whose MHC molecules that do not present epitopes (the part of an antigen to which an antibody attaches to) from a certain pathogen will be susceptible to infection. Eg. MHC haplotype is a predictor of survival horizon in people infected by HIV
Explain how MHC haplotypes influence mate selection
People with different haplotypes find each other romantically attractive. This mechanism is through linkage of MHC haplotype and body scent. This effect will tend to produce offspring with a larger spectrum if peptides, producing broader resistance to infection.
Provide a short summary of MHC haplotypes
A person’s MHC haplotype is the set of alleles for over 100 highly polymorphic sites. MHC haplotype is a signature for individuality, it governs transplant rejection, resistance to infection and mate selection
Define population
A population is an interacting and interbreeding group of individuals of the same species that inhabit the same geographical area
When will a population display narrow variation, provide an example
When a population has passed through a bottleneck or that has developed in isolation from a small “founder group”, eg. cheetahs are as closely related to human siblings
When is a population likely to have a long evolutionary history
When they display a high variation
What are the two observations that make up the idea of a species
> many living things can be classified into discrete non-overlapping groups. These are individual species
a catalogue of these groups can be organized into a hierarchy based on their similarities
Discuss the species concept
biology has a long-standing commitment to the idea of species. The species concept is based on human’s desire to organize and categorize organisms into groups for a better understanding and for research purposes. The term species appears in Darwin’s and Linneaeus’s books, but it still does not have a concrete definition. One of the reasons as to why this is, is because biology is messy and nature doesn’t work in nicely packaged boxes. Still many biologists have have worked hard to find a definition, but there are many exceptions that challenge these definitions.
Why have many groups of higher plants and animals coalesced into a hierarchy of discrete groups, with differentiating sets of features?
This is because of evolution- specifically the discreteness of of the evolutionary niches for which different populations compete and the emergence of reproductive isolating mechanisms, the geographic barriers and the genomic ones.
How does evolution generate variation in different types of organisms
All living things are subject to mutations. For diploid organisms independent assortment and recombination provide variation. For haploid organisms they use gene transfer to generate and propagate useful variants. For higher organisms sexual reproduction and horizontal gene transfer.
How do eukaryotes and prokaryotes generate variation
Eukaryotes use sexual reproduction- they have ancestors and descendants and can be organized into
a hierarchy, while prokaryotes that exchange genetic material does not fit the classical hierarchy.
Discuss the role of genomics in ‘species’
Genomics has both raised opportunities and challenges when it comes to defining species. The good thing is that DNA sequences among individuals of different species differ more than the variation that is seen among individuals of the same species. The con is that the extent of sequence diversity that distinguishes species is variable across taxa. Genomic distance(variation across the genome) does not provide a definition of species.
How are bacterial species defined in microbiology
Bacterial species are defined as a set of strains that maintain > and = 97% sequence identity in the gene 16S rRNA
Explain why we reconstruct genomes
They contain records of evolutionary history. The organisms on earth are the product of their response to the environment and the interactions that they have with individuals and species, for example the competition among members of the same species and the attack and defense between different species(Humans and pathogenic viruses and bacteria)
Explain how the geological environment has been imposed by early life in the longer term
Early organisms released oxygen through photosynthesis, 2.5-1.9 x 10^9 years ago. This change in the composition of the atmosphere made the development of aerobic metabolism possible
A palaeontologist named G.E Hutchinson wrote a book entitled the ecological theater and the evolutionary play, what did he emphasize
The interdependence of the environment and living things, which reciprocally affect each other. For example oxygen introduction
Why will we never know the full extent of the species that nature has generated
The organisms alive today represent 1% of all organisms that have existed on earth
How was the conclusion that all living organisms are related reached
For higher organisms their anatomy and embryology was studied and compared. For prokaryotes their biochemistry and genome were studied.
Explain the causes of extinction
Darwin was the first to argue that new species arise, but Cuvier was the first to propose that species can go extinct. He believed that the extinction of species was the result of natural catastrophes. Sometimes natural catastrophes do cause extinction, eg. the asteroid that landed in what is now known as the Yucatan peninsula of Mexico, about 65 million years ago( at the boundary between the cretaceous and tertiary ages). However in other cases species become extinct because of a gradual change in the environment. It can also be the result of deliberate activity, eg. sailors killing Dodos in Mauritius or the extinction of smallpox. Today extinction occurs because of human activity- habitat destruction- and other are natural events, such as elm yellow which affects elm trees, the fungal “white-nose disease” which affects bats, devil facial tumor disease, a transmissible( by biting) cancer in Tasmanian devils. The mass extinction that is going on now is largely the result of human activity (directly and indirectly), eg. excessive hunting, habitat destruction(including the effects of climate change), the transplant of species that replace native ones and the introduction of toxic chemicals into the environment. Also, the extinction of one species can lead to the loss of another, eg. some plants are dependent on a particular insect for pollination. If that insect becomes extinct then that plant will suffer. The origin of new species and the extinction of others is a feature of the history of life.
When is a species considered extinct
If no individuals has been seen in 50 years
Provide examples of species that are extinct
A marsupial called thylacine or the Tasmanian wolf has been extinct since 1937 because of hunting by dogs. The Northern White Rhino, a sub-species of rhino, was functionally extinct because there were 2 females and 1 male, but the male died. However they have some of his sperm and this can be put in the female or another sub-species of Rhino to try to save them. Another example is the survival of Pere David’s deer- it occurs in china and was kept in the Emperors nature reserve, he gifted a few to zoos around Europe. The populations grew and eventually, some were introduced back into the wild.
Can we reverse extinction
There are 2 proposed methods:
> selective breeding- to recover a certain species start with a closely related species that exists then choose individuals with similar phenotype and breed them over generations to produce a species similar to the extinct one, eg. for Mammoths, start with an Asian elephant, select individuals with long hair. long tusks, hump behind skull, etc. Continue after many generations. The only problem with elephants is their generation time, so you could choose species like the passenger pigeon that has a shorter generation time. The issue with the selective breeding method is even if the individuals have a similar phenotype to the extinct species they do not have the exact genotype so they cannot be the extinct species.
> Use CRISPR/Cas system to modify the genome of a living relative to approach that of the extinct species. This approach assumes that you know the genome sequence of the target species and that there is a related species to act as the host.