Bioinformatics Flashcards

1
Q

What is DNA encoded by and what is its purpose?

A

 Can hold information

 Encoded using nucleotide sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the nucleotide base pairs in DNA and what their purpose is

A

 Complementarity (AT,GC)
• Copying and repair
• Mechanism for heredity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do organisms differ to each other?

A

• Organisms differ due to different sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the genetic similarity between 2 humans?

A

o The genetic similarity between two humans is 99.9%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the genetic similarity between a human and a banana

A

o The genetic similarity between a human and banana is 60%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the structure of DNA

A

 DNA structure- made of nucleotides built into DNA strand
• Anti-parallel DNA helix
• Each strand is complementary
• Genes spaced along strands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is DNA packaged in?

A

 Packaged into chromosomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the genome?

A

• Genome- complete DNA/RNA sequence required to maintain or make an organism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is genomics?

A

• Genomics- study of entire genome and products (RNA, proteins)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the aim of genomics and what does it involve?

A

o Aim: To understand how genome contributes to functioning of cell, organism, population and ecosystem
o Involves large scale molecular biology and genetics which makes complex data sets
o Relies heavily on automated data acquisition, computer-based data analyses (bioinformatics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the number of base pairs in the human genome?

A

• More than 3 billion base pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the number of protein-coding genes

A

• About 30,000 protein-coding genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the function of interspersed DNA?

A

o Interspersed DNA- which seems to have regulatory function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is genomics potentially useful?

A

• Human health
o Detect genetic variants that increase disease risk
o Identify cancer mutations, search for cure/treatment
o Pathogen identification and mitigation (e.g. SARS disease)
 Identify antibiotic targets
 Rapid disease identification
• Agriculture/animal breeding
o Enhance productivity, consistency and progeny quality
o Identify disease-causing genetic variants
• Study biology of non-model organisms
• Answer fundamental questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe 2 methods for sequencing DNA

A
  • Sanger sequencing, older method still in use

* De novo sequencing of genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe how Sanger sequencing sequences

A

• Sanger sequencing, older method still in use:
o PCR or cloning in plasmid of gene of interest, to make many copies of the same piece of DNA
 DNA separated into 2 strands
o Dye termination (Sanger) sequencing of product
 Take purified fragments of DNA copies
 Add dNTPs, primers (complementary to each end of fragment), DNA polymerase
 Add ddNTPs- fluorescently labelled chain terminators that identify the last base incorporated into the chain. No 3’-OH group.
• Once ddNTP added to chain, chain elongation stops
 Fragments are put together to reveal sequence of original DNA sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a chromatogram and how do you read one?

A

o Chromatogram-result of Sanger Sequencing
 Each peak represents a fluorescent flash from labelled nucleotide
• Each different nucleotide is labelled different colours- order of colour=order of nucleotide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What DNA strands is a chromatogram suitable for?

A

 Suitable for short stretches of DNA (about 700 bp) but not suitable for long stretches of DNA as would take too much time
 Could sequence a whole genome a bit at a time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the advantages of de novo sequencing of genomes?

A
•	De novo sequencing of genomes
o	Newer method of sequencing 
	Works for any organism
	No need to know gene sequence in advance
•	Don’t need primers     
o	Very rapid
o	Quite cheap
	Human: about $1000 USD
	Send sequence to lab and may get sequence in small bits back in 5 days and get the entire synthesised genome back in 2 months
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What other names is de novo sequencing known by?

A

o Whole genome shotgun sequencing/massively parallel sequencing/name based on machine manufacturer (e.g. illumina or 454)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How does whole genome shotgun sequencing occur?

A

 Extraction of DNA -> many copies of the genome (1/cell)
 Cell samples are inserted into sequencing instrument where high intensity soundwaves break the DNA into billion of pieces which are only 600 bases long
 Special tags are added to the ends of the fragmented DNA
• Add adaptors to fragments to anchor them to a support
 Tagged strands of DNA attach to a tagged slide
 In a sequencer, each piece of DNA is copied hundreds to thousands of time
• Creates clusters of identical DNA fragments
• PCR to amplify each fragment
 Sequencer reads the DNA in parallel, one base at a time using different coloured tags for each DNA base
 Special sensors within the machine detect different coloured tags
 Computers piece together individual DNA fragments and determines order and orientation of contigs using overlaps, laying out reads and make a consensus
 Genome sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are reads?

A

• Product of sequencing is called reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why do we want to sequence the strand many times in whole genome shotgun sequencing?

A

• Want to sequence strands many times as easier to find overlaps this way (sequence in depth)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are contigs?

A

• Contig- stretch of contiguous sequence

o Reads stitched together with no gaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Describe how errors are dealt with in de novo sequencing

A

o Dealing with errors in de novo sequencing
 Take the average of the multiple nucleotide reads at that position and reach a consensus
 To be more confident in average, should have more reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is deep sequencing?

A

• Deep sequencing- sequence every nucleotide many times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What amount of overlap is acceptable in de novo sequencing?

A

 Overlap acceptable for assembly depends on error rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Describe the relationship between read length and assembly

A

 The longer the read, the easier the assembly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Describe the relationship between error rate and assembly

A

 High error rate makes it harder to assemble genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the best type of de novo sequencing in terms of reads and error rate?

A

 The best type of sequencing to have has long reads and no errors
• Trade off between reads and errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is Moore’s law?

A

• Moore’s law

o Cost of genome sequencing declines over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are the pros and cons of sequencing your own genome?

A

Pros
=Find out about disease risk to mitigate risk
=Genome sequenced with partner sequence to find risk of passing on disease to child

Cons
=Privacy issue/security issue
=-Life insurance
=If increased risk of disease that can’t be cured, will worry about it- worry about disease risks
=Risk amount is not precise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the factors affecting genome assembly?

A
  • Error rate in DNA sequencing
  • Read length
  • Repetitive DNA
  • Size of the genome
  • Number of reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Describe how error rate in DNA sequencing affects genome assembly

A

o Overlapping sequences may not be identical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Describe how read length affects genome assembly

A

o Shorter reads will be harder to assemble (less overlap)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Describe how repetitive DNA affects genome assembly

A

o Repetitive DNA maps to multiple locations within the genome and can’t be assembled properly, which is why 1% of the human genome remains unsequenced
 Since repeated sequences are identical, they cannot be assigned to a unique genomic location: hence, relative locations and orientations of gene contigs cannot be determined
 Transposons and retrotransposons
o Repetitive DNA makes genome assembly hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What percentage of the human genome is repeats

A

o 50% of the human genome is repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Describe how the size of the genome affects genome assembly

A

o The larger the size, the more reads is required to cover it and the more difficult it will be to assemble it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is genome size and what is it typically measured in?

A

o Genome size- the total number of base pairs of DNA in a haploid set of all chromosomes of an organism. Typically measured in
 Base pairs
 Kilobase pairs
 Megabase pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How does number of reads affect genome assembly?

A

o More reads means there is better assembly (deeper coverage)
 Higher coverage (average redundancy) is better
o More opportunity to correct errors and find overlaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is coverage?

A

 Coverage- the number of times on average that each base pair in the gene has been sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you calculate genome coverage?

A

 Genome coverage= Number of reads * (Average read length/Length of genome or contig)
• Units must be constant in calculation, so check the units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the purpose of paired-end sequencing?

A

• To help solve the issue of repetitive DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How do you perform paired-end sequencing?

A
  • Sequence both ends of DNA fragments of known size
  • Can flank repetitive elements and assign them to some scaffold
  • Sequenced fragment needs to be longer than the repetitive element for paired-end sequencing to work
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is a scaffold?

A

• Scaffold- long segment of genome containing multiple contigs with known orientation and some missing data

46
Q

What will a sequenced genome include and what is the function of these inclusions?

A

• The sequenced genome will include
o Genes transcribed into RNA
 Some are translated into proteins
o Recognition sites that interact with DNA/RNA/Protein- important for regulating gene expression
o Structural sites-important for packaging of DNA into histones and chromosomes
o Transposable elements
o Other- interspersed DNA

47
Q

What is the aim of annotating the genome?

A

• Aims:
o Identify the location of all genes and other functional elements (locus: the address of the gene)
o Define biochemical, cellular and biological function of each gene product

48
Q

What is annotation of a genome?

A

• Annotation- attaching biological functions to DNA sequences, based on experimental evidence or computational analysis

49
Q

Are all species equally easy to assemble genome and annotate?

A

• Genome organization varies between species
o Viruses and bacteria are easy to assemble and annotate
o Animals and plants are harder to assemble and annotate

50
Q

Why are viruses and bacteria easier to assemble genome and annotate?

A

 Fewer genes and higher gene density than eukaryotes
 Lack of introns
 Compact gene regulatory sequences
 Overall lower complexity of protein structure

51
Q

Why are animals and plants harder to assemble genome and annotate?

A

 Larger, more complex genomes (less protein coding sequences, introns, complex regulatory sequences)
 Alternative splicing (exons combined in different ways to produce different RNAs), large complex gene families

52
Q

What is the relationship between organism complexity and the proportion of the DNA in the genome is actually an expressed unit of inheritance due to higher levels of regulatory sequences

A

o The more complex an organism is, the lower the proportion of the DNA in the genome is actually an expressed unit of inheritance due to higher levels of regulatory sequences

53
Q

How do you annotate a genome?

A
Identify putative gene/ functional sequences in the genome
	Through experimental approaches
	Through computational approaches 
o	Assign biological function 
	Through similarity searches
	Through further experimentation
54
Q

How do you identify putative gene/functional sequences in the genome through experimental approaches and what organisms does this work the best with?

A

• Common with model organisms
• Compare expressed gene sequences to genome to identify transcribed sequences
o cDNA is artificially created from mRNA transcripts taken out of the cell
 Reverse transcriptase enzyme copies mRNA into cDNA
• Only transcribes exons into cDNA, not introns
o See if transcribed cDNA lines up to complementary DNA in the genome
 cDNA are also called expressed sequence tags (ESTs, short cDNAs)
• ESTs- bits of genome that have been found to be transcribed using experimental approaches
o If have piece of transcribed cDNA that matches the genome, it must be a transcribed gene (exon), and the rest of the genome sequence is introns
• Complete cDNA clone set from an organism would allow complete annotation of protein coding genes

55
Q

What is the disadvantage of identifying putative gene/functional sequences in the genome through experimental approaches?

A

o If want to annotate every single gene in genome using cDNA matching, need to sequence every cell and tissue type in organism in every possible environmental conditions, including all stages of development, all tissues under all conditions
 Some genes are only conditionally switched on
o Alternative splicing- different exons can be combined in different ways within a gene and hence end up with multiple different possible mRNAs-> need to acknowledge the scope of different exon combinations
o Hence, this method not used much for vertebrates

56
Q

How do you apply comparative genomics computational approaches to identify putative gene/functional sequences in the genome through experimental approaches?

A

• Comparative genomics
o Compare genome sequences and annotations from related species to refine annotations
o Sequences conserved among species are more likely to be functional than non-conserved sequences

57
Q

How do you apply computational approaches when no comparative genomics data are available to identify putative gene/functional sequences in the genome through experimental approaches?

A

o Eukaryotic genomes may contain thousands of genes with no experimental data available
o Algorithms predict gene structure based on recognition of certain sequence motifs from may other organisms

58
Q

What are open reading-frames?

A

 Open reading frames- sequences that could encode polypeptides
o Open reading frame: part of a reading frame that has the potential to be translated
o Continuous stretch of codons in the same frame that contain a start codon and a stop codon

59
Q

What is included in the open reading frames of eukaryotes?

A

• Eukaryotes: open reading frames span intron/exon regions

60
Q

How do you predict open reading from the genome using computational approaches and how accurate is this approach? What algorithms are involved in this?

A

o Algorithms are used to predict which are introns or exons by looking at splice sites in open reading frames
• Most algorithms search for open reading frames greater than some minimum size to exclude small areas that cannot be genes
• Some inaccuracy
o In single small exons separated by very large introns
o 50-90% accurate in predicting exons
o Need to experimentally confirm

• Process-
o Identify the 3 reading frames in the forward direction and the 3 reading frames in the complementary strand
o Highlight all potential start codons
o Highlight any stop codons that are in the same reading frame as the four identified start codons
o Identify open reading frames and corresponding amino acid sequence

61
Q

How are protein domains predicted by looking at a genome using computational approaches?

A
	Protein domain searches- particular amino acid sequence may be important for particular function 
•	Predicting protein domains
o	Functionally important parts (protein domains) can be identified computationally by using characteristic sequence motifs and similarity
o	Conserved (similar) protein domains provide clues about biochemical activities
62
Q

Describe protein domains in eukaryotic proteins and how they occur

A

o Eukaryotic proteins often modular
 Distinct domains joined together; some domains found in numerous genes
 Exon shuffling via duplications, translocations, inversions

63
Q

Describe why functional RNAs that don’t code for proteins are hard to predict using computational approaches looking at a genome

A

• Functional RNAs that don’t code for proteins are difficult to predict computationally (they don’t have protein motifs)

64
Q

How are tRNAs identified using computational approaches looking at a genome sequence?

A

• To identify tRNAs:

o Search for characteristic complementary sequence that allows sequences to fold up on themselves

65
Q

What is the structure of a eukaryotic gene?

A
  • Start codon
  • Stop codon
  • Exons
  • Introns
  • Promoter
  • 5’ Untranslated region
  • 3’ untranslated region
66
Q

Describe the principle of assigning biological function to a gene through similarity searches

A

o Genes with similar sequence assumed to encode products with similar biochemical functions
o Initial annotation categorizes genes by presumed function on basis of blast searches
o About 50% of predictions are based on sequence similarity to known proteins
o Needs to be confirmed experimentally, but not always possible

67
Q

What is common software used to assign biological function to a gene through similarity searches and how does it work?

A

 Common tool to search for similar sequences is BLAST (Basic Local Alignment Search Tool)
• Searches your gene of interest (query sequence) against a database of nucleotide or protein sequences
 BLAST process-
• Cuts searchable database entries into words of customisable length
• If exact match found between word in your query sequence and in the database, comparison extended in both directions
• Stops when highest hit has been found
• Allows some mismatches – customisable mismatch rate
• Hits (matching sequences) must have minimum score-customisable minimum score
o Score is based on identical and non-identical nucleotides/amino acids
• Different types of searches depending on query type and which database you want to search (nucleotide vs protein databases)

68
Q

What are gene families?

A

• Genomes contain gene families

o Groups of genes that are evolutionarily related by gene duplication

69
Q

How many gene families are there in the human genome?

A

10,000 gene families

70
Q

What does gene family expansion depend on?

A

o Gene family expansion depends on gene function importance to organism

71
Q

What is comparative genomics?

A

• Comparative genomics- inter or intra specific comparisons of genomes to understand genome evolution and function

72
Q

What are sequences conserved among species more likely to be than non-conserved sequences?

A

• Sequences conserved among species are more likely to be functional than non-conserved sequences

73
Q

What are the uses of comparative genomics?

A

o Elucidates organism relationships
o Elucidates genome changes over evolutionary time
o Elucidates novel gene origins
o Elucidates basis of organismal differences

74
Q

How has comparative genomics been used to elucidate organism relationships?

A

 Large amount of DNA sequence information has helped resolve the tree of life
• Alignment of homologous nucleotides is used to ascertain phylogenetic relationship
• Tree reconstructed on basis of similarity

75
Q

What are homologous nucleotides?

A

• Homologous nucleotides: descended from the same nucleotide in the common ancestor of species being compared

76
Q

What is the relationship between sequence similarity and organism relatedness?

A

o The more similar the sequences, the more closely related the organism

77
Q

Why do rRNA genes provide a universal sequence for comparison between distantly related species?

A

 rRNA genes provide a universal sequence for comparison between distantly related species
• Ubiquitous- present in all organisms
• Highly conserved due to important biological function

78
Q

What type of DNA sequences are used to resolve ancient divergences and why?

A

 Highly conserved protein-coding DNA (exon) and rRNA sequences can resolve ancient divergences due to slow evolution of that DNA

79
Q

What type of DNA sequences are used to resolve recent divergences and why?

A

 Noncoding sequence may clarify very recent relationships as it can accumulate mutations rapidly

80
Q

How can comparative genomics elucidate genome changes over evolutionary time?

A

o Elucidates genome changes over evolutionary time
 Two closely related species may share almost 100% of the genome
• Genomic differences between sister taxa define phenotypic differences between species
 Can look at relatedness between species of the same genus to calculate how quickly the genome has changed over evolutionary distance

81
Q

What are homologous genes and what are two types of such genes?

A

• Genes descended from the same gene in the common ancestor of the species being compared
o Orthologous genes (orthologs):
o Paralogous genes (paralogs):

82
Q

What are othologous genes and how are they derived?

A

o Orthologous genes (orthologs): genes in different species derived from a single ancestral gene
 Usually have the same function in different species
 Derived by speciation

83
Q

What are paralogous genes and how are they derived?

A

o Paralogous genes (paralogs): genes derived from duplication of a single ancestral gene
 Usually have related but biologically distinct functions

84
Q

What is a synteny map and how is it used?

A

• Synteny map-take a chromosome by flow sorting chromosomes according to different size and pain them with fluorescent dye and then use painted chromosome bits to hybridise with chromosomes of a different species to see similarities between species
o Can look at similarities between species

85
Q

How can comparative genomics be used to elucidate novel gene origins?

A

 By comparing genome sequences, we can understand how new genes arise

86
Q

How can new genes appear in a genome?

A
  • Gene duplication and divergence
  • Exon shuffling
  • Reverse transcription
  • Derivation of exons from transposons (jumping genes)
  • Horizontal gene transfer
  • Gene fission/fusion
  • De novo derivation from noncoding sequence
87
Q

How can new genes appear through gene duplication and divergence? How does it occur and what are the possible outcomes of it?

A

o During DNA replication, genome is replicated so end up with duplication of genome lying next to original copy
o Fate of gene duplicates depends on the molecular basis of the duplication
 As there are 2 copies of the same gene, one copy then diverges due to redundancy by accumulating mutations

o Fully redundant genes (that is, same function) are not maintained for long. Possible outcomes
 Gene loss: they may degenerate (accumulate mutations) due to lack of positive selection, and become pseudogenes (now non-functional gene)

 Sub functionalization: mutations may produce two different genes with complementary functions
• Both copies have mutation- both copies do one aspect of original task
• Specialisation of original function

 Neofunctionalization: mutation in one copy provides new function not provided by other

88
Q

How are most gene families made and quantitatively how?

A

o Most genomes contain a gene families derived from duplication events
 Duplication rate: 0.01 genes per million years
• Average eukaryotic genome of 10,000-30,000 genes: one gene duplication per 3000-10,000 years

89
Q

Describe the case study of the platypus venom gene?

  • Gene family and similarity to other gene family
  • Properties of the gene families
  • Process of identification and localisation
  • Results
  • Testing protein function and principles used to do so
A

o The platypus venom gene- case study
 Defensin-like peptide (DLP) toxin family
• Most abundant
• Unknown role
 Sequence similar to beta-defensins (similarity determined using BLAST)
• Innate immune system
• Antimicrobial
 Identified all genes in defensin-like peptide gene family using BLAST
• Anti-microbial peptides have different cysteine spacing
o Cysteine essential for folding of protein
• But mystery intermediate gene
 Then localised gene of interest to chromosome using Fluorescence In Situ Hybridisation (got gene of interest, labelled it and put it in chromosome mixture in platypus to figure out where they are)
• Found Chromosome X2 had a defensin, toxin genes and mystery intermediate gene
• This kind of arrangement is hallmark of gene duplication
• Think there has been 4 rounds of gene duplication -evolved through gene duplication
 Test protein function
• Immune gene is anti-microbial
• Toxin gene has toxin function
• Intermediate gene is mystery
• Use zone of inhibition assay
o Immune gene killed bacteria
o Toxin gene did not kill bacteria
o Intermediate mystery gene slightly killed bacteria
 May be losing microbial function or intermediate in function between defensin and toxin genes
• Reverse transcription PCR
o Toxin gene evolved recently has only expression in venom gland which suggests that every time duplication takes place from genes, get more and more specialised toxin function

90
Q

How can exon shuffling make new genes?

A

• Exon shuffling
o Different exons are duplicated and moved around during meiosis which can result in different combinations of exons which can make gene with new function

91
Q

How can reverse transcription make new genes?

A

• Reverse transcription

o Where RNA gets transcribed from the DNA, then gets reverse transcribed and inserted back into the genome

92
Q

How can derivation of exons from transposons make new genes?

A

• Derivation of exons from transposons (jumping genes)
o Can move different bits of genome around
o Can evolve very quickly

93
Q

How can gene fission/fusion make new genes?

A

• Gene fission/fusion
o Interspersed DNA is accidently edited out of genome and end up with 2 genes next to each other that weren’t next to each other before-> can take on new function

94
Q

Why do organisms differ?

A
	Organisms differ due to
•	Different gene sequences, gene copy numbers
•	Variation in gene regulation
•	Variation in splicing
•	Novel genes
95
Q

What are complex traits?

A

• Complex traits -Underpinned by interacting networks of thousands of genes

96
Q

What is evolutionary innovation?

A

• Evolutionary innovation- novel phenotypic trait that allows subsequent radiation and success of a taxonomic group

97
Q

Why is it useful to study animal pregnancy?

A

• Useful to study pregnancy for scientific advancement, saving critically endangered species by forcing animal pregnancy, for further understanding of human pregnancy using animals as models.

98
Q

How many times has viviparity (live birth) evolved?

A

150+ times

99
Q

How did evolution from oviparity to viviparity occur?

A

• Evolution from oviparity to viviparity
o Internal fertilisation first
o Egg retention second
o Selective pressures for adaptation at site of gestation
 Reduction in egg coverings (reduced egg shell)
 Maternal-fetal interactions

100
Q

What are challenges that may have driven viviparity adaptations?

A

 Gas exchange
 Immunoregulation: baby needs to not be recognised as foreign, or would be ejected from body
 Waste removal: preventing ammonia build up which could kill baby
 Nutrient supply: through umbilical cord between mum and baby

101
Q

How does syngnathid fish reproduction occur?

A

• Courtships occur
• Females transfer eggs into male brood pouch as they ride up the water column
• Female will leave egg in pouch
• Father will fertilise egg in pouch and will carry developing embryo for 20-25 days and will give birth to them
o Male has contractions
• Only vertebrates that have male pregnancy
• If find similarities in genetic basis of pregnancy in male seahorse/other female vertebrates, then evolution of pregnancy is predictable -> convergent evolution at genetic level

102
Q

How do we define males and females in ALL organisms?

A

• Define male vs female by gamete size
o Female: large sex cell (ovum)
o Male: small sex cell (sperm)

103
Q

Describe the composition of a seahorse non-pregnant brooding pouch

A

• Non-pregnant pouch- thin lining and filled with seawater

104
Q

Describe the composition of a seahorse brooding pouch

A

• Pregnant pouch-pouch becomes thick as embryos embed themselves into the pouch, fluid changes and septum develops

105
Q

Describe how genomics can be used to explore what is going on during pregnancy (pregnancy functions)

A

• Use shotgun sequencing to figure out what is going on inside male brood pouch
• Identified pregnancy functions by comparing gene expression between reproductive stages
o Count mRNA in pregnant vs non-pregnant fish-> look at differing gene expression
o Look for genes that differ in expression levels between treatment/timepoint and are differentially expressed (have differing mRNA amounts)
• Downregulated genes (genes that are high in copy in non-pregnant fish but low in copy in pregnant fish) most probably have a pregnancy function
• Upregulated genes (genes that are low in copy of non-pregnant fish but high in copy in pregnant fish) most probably have a pregnancy function
• Unchanged genes (genes that have same copy number in pregnant fish and non-pregnant fish) most probably are maintenance genes
• Measuring gene expression-
o RNA-seq= RNA (cDNA) sequencing
 Massively parallel sequencing- uses same technologies as whole genome shotgun sequencing
 Also called whole transcriptome shotgun sequencing
• Sequence cDNA as reads
• Reads can be aligned to gene sequence
• Compare read numbers between treatments to identify upregulated/downregulated genes
• Measured pouch gene expression and compare between reproductive stages to identify pregnancy genes
• Search sequences against databases of known genes using BLAST
o Predicted functions of genes that were up or down regulated during pregnancy

106
Q

What is the transcriptome?

A

o Transcriptome: full set of expressed genes in a tissue/treatment/time-point (study of transcriptomes is transcriptomics)

107
Q

Describe the genes upregulated during syngnathid fish pregnancy and why these are upregulated

A

• Upregulated pregnancy genes:
o Pouch remodelling
o Nutrient transport
o Waste removal
o Immune genes
• APOM most differentially expressed gene in pregnant syngnathid fish compared to non-pregnant ones
o APOM is a lipid transport
o Suggests that male seahorses are transporting lipids in embryos in early, mid and late pregnancies
o Supports energy demands of embryo
• Inorganic transporters extremely upregulated during syngnathid fish pregnancy
o This may be due to heavy calcium requirements due to skeletal structure
• Waste transporters which removed carbon dioxide and ammonia that were upregulated during syngnathid fish pregnancy
• Immune genes that are antibacterial/antifungal upregulated during pregnancy
o Protects embryos from seawater pathogens
• A gene that was only expressed during mid and late syngnathid fish pregnancy is the hatching enzyme (normally present in fish embryos) which causes embryo to hatch
o During mid and late pregnancy, syngnathid fish hatch out of covering and swim freely within the brood pouch
o Fathers are upregulating this gene to signal to embryos that it is time to hatch
 Could be example of convergent evolution

108
Q

What is a downregulated gene during syngnathid fish pregnancy?

A

• Inflammatory immune genes downregulated during pregnancy

o If there’s too many of these genes, there is rejection of embryo

109
Q

What were the functions of the male syngnathid fish brooding pouch inferred from the RNA-Seq experiment?

A
•	Functions of the brood pouch inferred from RNA-Seq
o	Nutrient provisioning 
	Transport of lipids
	Transport of minerals
o	Immunological protection of developing embryos
	From bacteria and fungi
	From father’s immune system 
o	Osmoregulation
	Via ion transporter genes
o	Waste removal 
	Carbon dioxide and ammonia
110
Q

Describe the effect of estrogen on pathways involved in pregnancy in mammals and seahorses and what this shows

A

• Estrogen receptor in mammals and seahorses
o Estrogen receptor is downregulated during pregnancy for quiescence and is upregulated in response to oxygotcin, which acts on a particular MAPK pathway
o MAPK pathway is upregulated: responds to estrogen and uterine stretching which indicates onset of labor
• Common genetic pathways in parturition between seahorses and other vertebrates

111
Q

Are pregnancy trends predictable? What shows this?

A

• The fact that seahorses have many pregnancy similarities to vertebrates that it has independently evolved from indicates that pregnancy trends are predictable