Bioinformatics Flashcards
How has Biolog become deluged with data:
- Nucleotides:
cDNAs (including ESTs)
Whole genomes
DNA polymorphisms
Copy number variations
DNA mutations
Microarray results
Epigenetics
- Proteins:
Amino acid sequences
Protein structures
Proteomics
Protein interaction maps
- Cells:
Cell signalling
Microscopy of living cells
What is Bioinformatics?
- Bioinformatics provides the tools for accessing
- and managing large amounts of biological data
- as well as the
- algorithms to assess relationships among members of the data sets.
What is Genebank?
Genbank is Huge, Growing, and Freely Available
Genbank is the NIH’s (National Institute of Health, USA) annotated collection of all publicly available sequences.)
- 218,642,238 sequences for traditional GenBank records
- 1,901,329,611 sequences
- for WGS
- August 2020
What is the Goal of Bioinformatics?
A goal of bioinformatics is to provide new insights into biological questions by enabling a more global view of a research question.
- More genes
- More proteins
- More sequences
- More genomes
What is PAX 6?
- The transcription factor PAX6 is the focus of the bioinformatics practicals.
- PAX6 is conserved across a wide range of species (flies to us).
- Lots of interesting mutations affecting eye development.
- Known crystal structures of the two DNA- binding domains of PAX6:
Homeobox and PAX domain
Transcription factors such as PAX6 recruit RNA Polymerase II via mediators
- The promoter is the DNA sequence where the general transcription factors and the polymerase assemble
- The cis-regulatory sequences are binding sites for transcription regulators, whose presence on the DNA affects the rate of transcription initiation.
- These sequences can be located adjacent to the promoter, far upstream of it, or even within introns or entirely downstream of the gene.
- The broken stretches of DNA signify that the length of DNA between the cis-regulatory sequences and the start of transcription varies, sometimes reaching tens of thousands of nucleotide pairs in length.
- The TATA box is a DNA recognition sequence for the general transcription factor TFID.
- As shown in the lower panel, DNA looping allows transcription regulators bound at any of these positions to interact with the proteins that assemble at the promoter.
- Many transcription regulators act through Mediator , while some interact with the general transcription factors and RNA polymerase directly.
- Transcription regulators also act by recruiting proteins that alter the chromatin structure of the promoter
- Whereas Mediator and the general transcription factors are the same for all RNA polymerase II-transcribed genes, the transcription regulators and the locations of their binding sites relative to the promoter differ for each gene
PAX6 homologs
- Drosophila: ey (eyeless)
- Mouse: Sey (Small eye)
- Human: PAX6
All three genes (ey, Sey, PAX6) are homologs. They are descended from a common ancestral gene.
Conservation of PAX6
In humans, some PAX6 mutations are associated with aniridia.
PAX6 is a “toolkit” protein for development
- The amino acid sequence of PAX6 is conserved across species.
- Biochemical activity is conserved across species (interaction with DNA and other proteins involved in transcription). Mouse PAX6 will work in flies.
- PAX6 regulates about 300 genes.
Domain Structure of PAX6
- A domain is a sequence of amino acids which can fold by itself (without the rest of the protein).
- PAX6 contains a pax domain and a homeobox domain.
- Both bind specific DNA sequences.
Structure of the Homeobox Domain of PAX6
The homeobox domain is a common domain for binding DNA.
Transcription Factors can “read” the DNA to bind in the correct place
- On the left, a single contact is shown between a transcription regulator and DNA; such contacts allow the protein to “read” the DNA sequence. On the right, the complete set of contacts between a transcription regulator (a member of the homeodomain family and its cis-regulatory sequence is shown.
- The DNA-binding portion of the protein is 60 amino acids long. Although the interactions in the major groove are the most important, the protein is also seen to contact both the minor groove and phosphates in the sugar-phosphate DNA backbone.
The Paired Box Domain (PAX) binds DNA
The paired box domain (PAX) is a common to many DNA-binding proteins.
How can CRISPR repair Pax6 and Treat Aniridia in Mice?
- An optimized Cas9 ribonucleoprotein complex and a single-stranded oligodeoxynucleotide containing the 3xFLAG sequence were microinjected into one-cell mouse zygotes to generate transgenic animals in one step.
- (A) Slit lamp analysis of the transgenic mouse eyes. As expected, images of Fey showed small eyes and corneal opacity, not significantly different from the Sey (Pax6 small eye) mouse model (two-tailed test, p > 0.999).
- Slit lamp images of Fax (FLAG-tagged Pax6) mouse eyes showed normal iris and clear cornea, which were not significantly different from WT eyes