Genomics Flashcards
What is the difference between complex and simple genomes?
Simple genomes have no introns, not much repetitive content and are mostly protein coding
Why do prokaryotes have such small genomes?
Are limited by power - DNA replication costs energy, so limit on genome size depending on the power the organism can produce
When eukaryotes engulfed bacteria, they decoupled replication from genome size allowing larger, messier genomes to develop
What is the C-value enigma?
Genome size doesn’t correlate to organism complexity (C-value is amount of DNA in haploid nucleus). Is resolved as very little of genome is protein coding in eukaryotes.
What is the structure of chromosomes?
Have a short arm (p = petite) and a long arm (q = the letter after p) Have a centromere (where the kinetochore forms) Have telomeres (for replication and stability, are conserved tandemly repeating sequences)
What are the different types of eukaryotic chromosomes?
Metacentric (centromere in the middle)
Submetacentric (centromere off centre)
Afrocentric (satellite p arms)
Telocentric (no p arms)
What are the different types of tandem repeats?
Mini satellites (10-100bp units), found in telomeric regions in humans Micro satellites (1-6bp units), found throughout the genome, is the large majority of all repeats Macro satellites (>100bp) difficult to analyse with PCR Useful for fingerprinting and population genetics
What is satellite DNA?
Short, tandemly repeated sequences, including mini, micro and macro satellites. Named as they appear as a ‘satellite’ when centrifuging sheared DNA in a caesium chloride density gradient as they are AT rich
What are pseudogenes?
Genes that are inactivated due to mutation, (frame shift, nonsense) or regulation
Often occurs if the gene is non-essential, or there are 2 copies (second copy accumulates mutation)
What are paralogs?
Homologous genes separated by gene duplication - genes with a common ancestor (have been duplicated)
What are orthologs?
Homologous genes separated by speciation - e.g. pig working vitamin C vs human not-working vitamin C synthesis gene
What are examples of transposed sequences?
Retroviruses
Transposable elements - DNA that can move around the genome
Processed pseudogenes - integration of cDNA back into a genome; has a poly A tail, no introns and no promoter
What are the characteristics of retroviral elements?
LTR
3’ and 5’ target sites for integration
Could interrupt a gene
Replicate DNA as they insert (target site duplication), disrupting gene expression
What are the characteristics of class I retrotransposons?
Copy and paste mechanism via and RNA intermediate
2 types
Type 1 are LTR: similar to retroviruses without env, don’t form infectious particles
Type 2 are non-LTR: LINEs (reverse transcriptase, make up 21% of human genome, most are unfunctional) and SINEs (no functional protein, need other mobile elements to move)
What are the characteristics of class II transposons?
Cut and paste mechanism
Encode a transposes enzyme
Most are inactive (e.g. deletions)
What are processed pseudogenes?
Mature mRNA is reverse transcribed and integrated into the genome. Lacks promoter (so is dead on arrival)and introns and has a poly A tail
Often have 5’ truncations due to low processivity of reverse transcriptase
Are dispersed throughout the genome (not near original gene)
Have target site duplication from insertion
What is the evolutionary story behind the IRGM gene family?
Immunity Related GTPase gene family
3 copies of the family in most mammals (humans only have 2)
50 million years ago, all but one copy was inactivated in monkey/great ape ancestor
24 million years ago, a retrovirus inserts at the start of a gene and forms a new promoter
12 million years ago, functional copy was fixed in gorilla, chip and human lineage
Today is expressed in several tissues in humans
Where is variation in genomes seen?
Sequence
Base modification (e.g. methylation)
Histone modification
Chromosome structure (length, inversions, duplications, deletions)
How does variation arise?
Mistakes in replication and chromosomal recombination and segregation. Has to be inherited i.e. in the germline to persist
What are SNPs?
Single Nucleotide Polymorphisms (or Variants, SNVs). Can be a transition (purine to purine, pyrimidine to pyrimidine) or a transversion (purine to pyrimidine). Could also be a single nucleotide deletion. Arise due to natural mutation or exposure to a carcinogen
What are the consequences of DNA variation?
Most are neutral and tolerated (a lot of DNA doesn’t encode protein; genetic code is degenerate so amino acid may not be altered; some amino acids can be interchanged). Some somatic mutations contribute to the changes seen in cancer. Occasionally there is positive or negative selection for a mutation
What are the most sensitive parts of the genome to mutation?
CpG dinucleotides that are subject to methylation. Methylated C can be deaminated to make a T. This can either be repaired (using the G on the other strand as a template) or fully converted to a T:A pair
What are CNVs?
Larger regions of DNA subject to duplication or deletion. They are evolutionarily important as sequences can diverge after a duplication. They usually arise due to non-allelic recombination (they are flanked by sequences with high homology).
What are the consequences of CNVs?
Pathways in which there is tight regulation of gene expression are most commonly disrupted such as control of foetal growth and brain development (or revealing a mutation on the ‘normal’ allele - loss of heterozygosity)
What is an example of non-sequence/DNA related mutation?
Epigenetic mutation. Could either be nucleic acid (e.g. methylating DNA) or protein (e.g. histone) modification
What are large chromosomal abnormalities?
Chromosomal deletions or unbalanced translocations that result in an allelic mis-balance of many genes, thus disrupting key pathways.
Normally arise through errors in chromosome pairing and segregation in meiosis and germ cell maturation
Why do diseases such as sickle cell anaemia and cystic fibrosis persist?
Whilst the homozygous mutation is detrimental, the heterozygous mutation aids survival of disease e.g. sickle cell anaemia and malaria
What are functional polymorphisms?
Variation in DNA with an impact on phenotype. Will either be in an open reading frame or in regulatory elements (e.g. promoters, enhancers, ncRNAs)
What is linkage analysis?
Used with pedigrees to identify genomic regions and loci responsible for a disease phenotype. Microsatelites and SNPs are used.
What is association analysis?
Uses SNPs (on a microarray) to see if SNPs associate with a specific allele/phenotype
How can we use variation?
Genetic maps - placing loci in relative order based on recombination events
Physical maps - compare to a reference genome
Linkage analysis
Association analysis
How can epigenetic mutation be studied?
Chemical modification to allow sequencing of methylated cysteine (either directly or through arrays)
Antibodies to histone modifications to pull down these areas of the genome and analyse with direct sequencing or arrays
How can we identify a SNP from a mutation?
Through SNP and mutation database
Individual labs contribute results from gene studies of patients and normal family members
Completion of a draft genome sequence
HapMap and 1000 genome projects
Improving sequencing technologies has increased throuhput
How can you create a SNP chip?
Know the bases that are variable in normal populations and have a database of all polymorphisms (1000 genome and HapMap project)
What is haploinsufficiency?
When one functional copy of a gene remaining (after mutation of the other) still causes a phenotype. This is rare in the population, as we all carry many mutations. Most haploinsufficient genes have a specific expression profile and are often involved in early development
What are the advantages of GWAS studies?
Include genes and loci that may not have been considered (e.g. if functions are unknown) in their analysis, unlike candidate gene approaches
What controls are necessary in GWAS studies?
Case-controol statistical analysis
This relies upon having 2 groups from the same population, rigorous phenotypic analysis (avoid phenotypes that can have a phenocopy - where the same phenotype can be achieved through many gene combinations) and if appropriate also match age and sex
p values must be adjusted e.g. by dividing the genome wide p value by the number of markers used (or more complicated things) to reduce false positives
What novel approaches can be applied to GWAS?
Simplifying the phenotype by being quantitative
Grouping genes to spot common themes or pathways
Combining different genetic analysis techniques to improve confidence
What are the drawbacks of GWAS?
If there are many loci, need larger studies and larger sample sizes to reach statistical significance
Only looks at DNA sequence as opposed to epigenetic mutations
Doesn’t tell you about parental origin of the locus
How do micro satellites form?
Slippage of polymerase - strands dissociate and a stem loop forms, resulting in expansion or contraction
How can chromosomes be prepared?
Cells must be dividing and arrested. Cells are swollen osmotically to spread the chromosomes. Cells are fixed to glass slide and chromosomes are stained for identification
When are chromosome preparations used?
Blood or tissue samples are used in post natal diagnosis e.g. bone marrow in leukaemia studies
Amniotic fluid etc are used in prenatal diagnosis
Sperm are used in fertility studies
What is amniocentesis?
Using amniotic fluid cells for pre-natal diagnostics
What is chorionic villus?
Finger like projections of the placenta into the uterine wall. Used in pre-natal diagnostics of high risk pregnancies as can be done earlier than amniocentesis
What are the advantages and disadvantages to different pre-natal diagnostic techniques?
CVS - placenta often has a different chromosomal constitution to the foetus
Amniotic preps are good, can only be done at 14 weeks
New technology allows analysis of foetal DNA in the mothers blood
How can chromosomal preparations be stained?
G-banding, using a giemsa stain and trypsin digest (cuts grooves at AT rich regions). Gives dark AT rich and light GC rich bands
FISH (fluorescent in-situ hybridisation), using DNA probes with fluorescent markers to light up complimentary regions. Strands must be separated (e.g. use of form amide)
What are the applications of FISH?
Chromosome painting Multicolour chromosome banding Gene mapping Counting chromosomes in nuclei Nuclear organisation
How does chromosome painting work?
A few dyes can give many colours due to overlapping sequences. Gives different colours for all chromosomes
What is karyotyping?
Taking a photograph of G-stained chromosomes and separating and pairing them up. Allows observation of any large-scale abnormalities and gene mapping
What are the different examples of chromosomal disorders?
Numerical abnormalities (aneuploidy, polyploidy - usually embryonic lethal) Structural abnormalities (deletions, duplications, insertions, unbalanced translocations - all severe; balanced translocations, inversions, Y chromosome deletions - mild symptoms, may lead to infertility)
What chromosomal numerical abnormalities are commonly found in humans?
Trisomy - one extra chromosome. Get 21, 18 and 13 along with sex chromosomes in live births
Monosomy - only see monosomy X in live births
What is down syndrome?
Trisomy of chromosome 21. Most common aneuploidy in live births
What is Patau syndrome?
Trisomy of chromosome 13
What are the different sex chromosome aneuploidies?
XO - Turner syndrome: short, webbed neck
XXY - Klinefelter syndrome: get breast development
XYY syndrome: very tall, mental retardationn
All are infertile
How does aneuploidy arise?
Failure of chromosomes to disjoin properly at firs division
What are the different types of deletions in chromosomes?
Terminal deletion - only requires 1 break point
Interstitial - requires 2 break points
Often get serious clinical features
What are the consequences of deletions on the Y chromosome?
Has very few genes and a lot of ‘junk’ DNA (evolved from a fully functional chromosome)
Only has genes to do with spermatogenesis and ‘male-ness’
Deletions often result in infertility but clinical features aren’t too severe as there are very few genes
What are the clinical implications of duplications and insertions?
Extra DNA so leads to severe clinical features
Duplications are extra piece copied next to the original
Insertions are extra pieces inserted from another chromosome
What is the difference between unbalanced and balanced translocations?
Unbalanced - loss or gain of genetic material leads to partial trisomy or monosomy. Severe clinical abnormalities
Balanced - no net gain or loss of genetic material, so usually no clinical effect unless a gene is disrupted. Risk to offspring (often get unbalanced translocations or infertility) as meiosis is messed up - reduced recombination in pairing cross, unbalanced gametes produced
What are the types of balanced translocations?
Robertsonian translocation - end to end fusion of acrocentric chromosomes
Reciprocal translocations - breaks in 2 chromosomes and fusion of one to the other. Important in cancer cells. Can be detected with chromosome painting
How do inversions arise?
2 break points, piece in-between inverts. Can be paracentric (no centromere) or pericentric (centromere involved).
What are the clinical consequences of inversions?
No clinical features unless a gene is disrupted. Can lead to reduced fertility due to messing up meiosis - reduced recombination within pairing loop, producing unbalanced gametes that may not develop
What are the risk factors in gamete aneuploidy?
Eggs - age
Sperm - age, smoking, chemotherapy
When are patients referred for cytogenetic testing?
Is expensive, so must be relevant
4-12 weeks gestation - observe spontaneous abortions here, trisomy and unbalanced rearrangements
12 weeks to term - abnormalities picked up on ultrasound/if at risk (e.g. older mothers or a family history or if balanced rearrangement in one of the parents)
Neonatal period - if have congenital abnormalities. Looking for trisomy, unbalanced translocations, deletions etc
Early development - if no meet milestones. Subtle chromosomal abnormalities e.g. fragile X
Puberty - inappropriate sexual development
Infertility and reproductive failure - balanced rearrangements
As part of a study
What are the problems with pre-natal cytogenetic screening?
Mosaicism - some cells normal, some not
Contamination of maternal cells
Risk to the foetus - so target screening to at risk groups
What are the outcomes following pre-natal cytogenetic screening?
Offered choice of abortion
Prepare for affected child
What alterations in chromosome position are associated with development or disease?
X inactivation - X at the periphery
Random arrangement in senescent and quiescent cells
Sex chromosomes to middle during spermatogenesis
Chromosome 18 to the centre in cancer
How are genes positioned on chromosomes in the nucleus?
Active genes tend to be towards the edge of chromosome territory for access to transcriptional machinery
Near foci of DNA polymerase II
Active genes towards nuclear centre
What is polygenic inheritance?
Traits/diseases caused by the impact of many different genes each having a small individual effect on a phenotype
What are quantitative traits?
All individuals can be placed on the spectrum based on a defined value
What are threshold traits?
Traits in which individuals must carry a sufficient number of risk alleles to have the phenotype
What are the examples of model free, non-parametric analysis?
Linkage analysis - affected siblings or extended pedigrees
Homozygosity maping - specific type of link analysis in pedigrees (founder effect)
Transmission disequilibrium test - pedigree based association study
Association mapping - population based
What are the principles of model free linkage analysis?
Looking at whether affected relatives share a chromosomal segment more often than would be expected - shared segment analysis
Don’t need to specify the mode of inheritance, number of loci, gene frequency or penetrance
What are the advantages and disadvantages of model free linkage analysis?
Can use smaller family clusters
More robust to errors (no model to have errors in assumptions)
Less powerful - need more individuals for statistical significance
What is identical by descent, and how can it be determined?
Identical by descent is determining which parent the phenotype is inherited from, and also which allele of which parent. If the parent is heterozygous (A/C), and the other is homozygous (C/C) and the allele is inherited from the heterozygous parent, then it is clear which allele must cause the disease in heterozygous children (A/C). Otherwise, it is necessary to use markers with multiple alleles (SNPs, micro satellites etc).
How does sibling pair analysis work?
2 siblings each with a disorder. Looking at allele inheritance from heterozygous parents (A/C)- if there is no linkage, then ¼ will be homozygous for A, ½ will be heterozygous and ¼ will be homozygous for C. If there is a difference in this, it suggests linkage of the allele with the disorder. Want to be able to estimate identical by descent sharing - know which allele links with the disease. To do this, highly polymorphic markers are required
How can model free linkage analysis be extended beyond sibling pair analysis?
Can look at affected pedigree member analysis - calculate the fraction of genes shared between members of a pedigree and work out the null hypothesis for pairwise comparisons
What is homozygosity mapping?
Searching for shared homozygous segments (both alleles being inherited from a common ancestor)
What is autozygosity mapping?
Homozygosity mapping with small, interrelated family pedigrees