Week 6.11.12 Visualisation Flashcards
6.1 Visualisation
Topic; Tables, ideograms, Genome browsers, Galaxy, CLC workbench, integrative maps.
Reading: EPG Ch 4
Learning outcomes
- Following this lecture (an attending the workshop and doing associated private study) you should be able to;
- Recognise why genome visualisation is important and appreciate the particular challenges that it poses. Demonstrate an awareness of the history of visualisation.
- Identify visualisation methods that are available for human genome visualisation and how they may be used.
- Apply web-based genome visualisation tools to analyse the human genome.
- Describe some of the technical solutions for genome visualisation
What is the point of having base pair reading visualisation?
Human chromosome 3 is approximately 198,022,480 base pairs in length
This is 0.0005% of it;
[IMAGE]
The truth is having this kind of visualisation is useless because there are no annotations.
Genome variation data
Much less data than a whole genome, but still meaningless in its raw form.
File for 23andMe
This is the individual SNP locations, telling you which chromosomes these SNPs are on – this is slightly more readable. But again these files are long, although they are more to the point somebody looking at this won’t tell them much.
So what would be useful?
This depends on who you ask;
What does a clinician need?
What does a normal person need?
What does a sicientist need? among researchers do they all want the same thing?
- *A clinician** wants to know is a high level of information of direct relevance to the patient’s health – something that they can use in their diagnosis
- *For a normal person:** actionable information that is easy to interpret with explanations and lifestyle advice – they dont need details of the science.
i. e they just want to know what to look out for in their diet, and just changes to their lifestyle
Scientist need detailed information that can reveal new biological insights. Typically dealing with genome for more than one individual. *
* Even among researchers, different people have different aims; high level view for population-wide studies; sequence level view if looking at SNPs etc.
Consumer setting (urine-based tests)
What works for consumers? Definitive indication of one state – no interpretation required.
What works for clinicians?
For consumers: definitive indication of one state – no interpretation required.
For clinicians: Multiple concentration values – disease relevance determined by the clinician.
The amount of data being visualised here is tiny compared to what’s available in the human genome – for genomic data we must move to computer-based reports.
How are genotype results are presented in genotyping services such as the direct To Consumer (23andMe) genotyping results?
Until recently what level of information was given?
- Raw SNP data is processed to provide context prior to being shown in tabular form.
- Algorithms are used to calculate disease risk based in the status of known SNPs in the individual.
- Until recently they gave a lot of information about the results from the genome – until recently where regulation imposed on them by the FDA.
- People want to know about what diseases and drug responses they can infer from their genomes.
A history of data visualisation
2,600 BC:
… nothing for ~800 years …
1669
1822
1829
1977
1987
1994
2001
2013
2,600 BC: World’s first known data table
10th Century: Position of the planets over time (unknown).
… nothing for ~800 years …
1669: Median remaining lifetime as a function of age (Christiaan Huygens graph of data from John Graunt)
1822: Price of wheat (bar) compared to weekly wage (red line) over several hundred years by William Playfair.
1829: Crime rate indicated by shaded regions (Adriano Balbi and André Michel Guerry).
1977: PRIM-9 - early interactive data visualistaion (John Tukey).
1987:“Brushing Scatterplot” - An interactive multi-part graph for desktop computers (Richard Becker and William Clevelan).
1994: Chromoscope E. coli genome viewer (Zhang et al.). Desktop-based.
2001: UCSC Human Genome Browser (Kent et al.). Web-based.
2013: RCircos (Zhang et al.)
What do all visualisations have in common?
What does the viewer need?
All these visualisations aim to show large amounts of data to the viewer, with the minimal cognitive load (i.e as easily as possible).
In each case, the viewer needs to be taught how to understand the visualisation e.g what the elements and colours represent.
Why bother with graphics?
Graphics can reveal trends hidden by summary statistics
All of these plots contain the same bits of information.
Anscombe’s quartet (1973); Each of these four datasets has exactly the same;
·Number of points
·Mean average x and y
·Variance
·Correlation coefficient
·Straight line of best fit
We still rely on graphs to visualise things to make sense of them.
What does this graph tell us?
This graph tells us that there has been an increase in the protein levels of this type, we can also see that the molecular increase is symbolically shaped – no fancy maths is needed.
We could see that before we even started that there was some protein in the sample.
6.2 Visualisation
Not that long ago did people started using colour to start representing quantitative information such as in 1829 – crime rate indicated by shaded regions (Adriano Balbi and Andre Michel Guerry)
Using colour
Examples of this;
Simple Univariate (only one variable show here – the cancer incidence), shading like that French map colour – only used for aesthetic reason (BBC brand in this case).
Colour maps can increase the resolution of shaing by mapping values to a wider range of colours. But this is still univariate, and can be confusing.
RCB colour mapping
Different primary colours can mix together making a combination of colours – this is the basic principle of how this works
We can cram in more information by mixing red green and blue in different proportions.
What can the colour be used to indicate in transcriptomic data from microarrays
Mixing varying amounts of red and green is very common for visualising different gene expression – last week we looked at transcriptomic data and the microarrays spots – we had the amount of red colour indicating the abundance of transcript in one sample – green in another and then mixing them together we had an idea of what was expressed with different relatives of extremes.
We can show three variables if we add blue – in a 3D style
What does HSV mean?
HSV colour mapping
RGB colouring can be difficult to interpret
HSV – hue saturation and value can potentially show three variables in a more intuitive way.
Transparency
Transparency can be useful when we want to indicate the confidence in a particular piece of data.
Consider this protein coverage data from GPMDB: