Week 1.1 Data Vis & 1.2 Guiding Principles Flashcards

1
Q

Defining Data Visualisation (4 Points)

RPVA - Ryan Paints Visual (p) Art

**Included in Readings**

A

Representation, Presentation, Visual perception, Amplify cognition RPVA

Representation: There isn’t much we can discern from raw, unprocessed data. However, if we can represent data in forms that we are familiar with, like geometric objects, we can start to gain insight. Data visualisation represents data in a visual form ready for our brains to process.

Presentation: Careful presentation of data is necessary to ensure that the story behind the data comes to light. There are infinite choices and decisions that need to be made when presenting your visualisation.

Visual perception: Our brain is a very complex and powerful pattern recognition and processing machine. We can exploit our visual processing capabilities to quickly and accurately interpret data. Good data visualisation exploits our visual systems and avoid its pitfalls.

Amplify cognition: Data visualisation should always inform and increase knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Types (4 Points)

CN, O, I, R - Sea Noir

**Included in Readings**

A

Categorical or Nominal (Qualitative), Ordinal (Qualitative), Interval (Quantitative), Ratio (Quantitative).

Categorical or Nominal (Qualitative): Categorical variables are group variables, or categories if you will. There are no meaningful measurement differences such as rankings or intervals between the different categories. Categorical or nominal variables include binary variables (e.g. yes/no, male/female) and multinomial variables (e.g. religious affiliation, hair colour, ethnicity, suburb).

Ordinal (Qualitative): Ordinal data has a rank order by which it can be sorted, but the differences between the ranks are not relative or measurable. Therefore, ordinal data is not strictly quantitative. For example, consider the 1st, 2nd and 3rd place in a race. We know who was faster or slower, but we have no idea by how much. We need to look at the race times.

Interval (Quantitative): An interval variable is similar to an ordinal variable except that the intervals between the values are equally spaced. Interval variables have an arbitrary zero-point and therefore no meaningful ratios. For example, think about our calendar year and the Celsius scale; 1000 AD is not half of 2000 AD, and 20 degrees Celsius is not twice as “hot” as 10 degrees Celsius. This is because our calendar and Celsius scale have an arbitrary value for zero. Zero AD and zero degrees Celsius do not imply the presence of zero time or zero heat energy.

Ratio (Quantitative): A ratio variable is similar to an interval variable; however, there is an absolute zero point and ratios are meaningful. An example is time given in seconds, length in centimetres, or heart beats per minute. A value of 0 implies the absence of a variable. We can also make statements like 30 seconds is twice the time of 15 seconds, 10 cm is half the height of 20 cm, and during exercise a person’s resting heart beat almost doubles. Zero heart rate, call 000!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Study this Plot anatomy

A

Plot anatomy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Guiding Principles (4 Points)

A

Strive for form and function

Justify the selection of everything you do

Creating accessibility through intuitive design

Never deceive the receiver

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Target Audience?

A

Your target audience is broadly defined as who you are trying to communicate with.

Here are some things to consider about audience:

  • How diverse or broad is the audience? Are they relatively homogeneous (e.g. a group of engineers) or diverse (e.g. the general Australian population)?
  • How do they vary in terms of age, education, and other background factors?
  • How big is your audience? Small audiences allow for personalisation.
  • How technical is your audience? Can we assume they understand data visualisation? Do they have subject-knowledge expertise? Do they know about statistics?
  • Does the audience have any special requirements? Your audience might have colour-blindness, poor vision, cognitive impairments, English as a second language.
  • How much time does your audience have? Many people consider themselves time-poor.
  • What makes your audience tick? Understanding their interests and motivations can help you to engage them.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Objective types (2 Points)

A

Explanatory, Exploratory

Function Explanatory: When the function is to explain (i.e. ‘presentation-orientated techniques’, Kosara (2016)) the visualisation is often carefully constructed around a narrative. Every feature of the visualisation has been carefully crafted to facilitate the telling of a compelling story.

Exploratory: When the function is to explore, a single story does not dominate and the focus of the visualisation promotes exploration and self-discovery of stories hidden in the data. Exploratory visualisations often make use of interactive features to help immerse the viewer in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a visualisations Tone?

A

A visualisation’s tone refers to features of the visualisation that are used to trigger an emotive response.

Kirk (2012) provides the following phrases to give you a sense of how you might communicate the objective of your visualisation:

  • persuade, shape opinion, inspire, change behaviour, shock, make an impact
  • learn, increase knowledge, answer questions, trigger questions, enlighten
  • conduct analysis, monitor, find patterns, no patterns, lookup
  • familiarise with data, play with data
  • tell a story, contextualise data
  • serendipitous discoveries
  • emphasise issues, grab attention
  • present arguments, assist decisions
  • experimentation
  • art, aesthetic pleasure, creative technique
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data visualisation Stortelling Techniques (3 Types)

A

Comparisons and proportions

  • Range and distribution
  • Ranking
  • Measurement
  • Context

Trends and patterns

  • Direction
  • Rate of change
  • Fluctuation/variance
  • Significance
  • Intersections

Relationships and connections

  • Exceptions
  • Correlations/Associations
  • Clusters and gaps
  • Hierarchical relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between Representing and Presenting data?

A

Representation means choosing an appropriate visualisation method while taking into account the characteristics of the data, the story to be told and the audience and the degree of precision (form vs function). There are many possible solutions, choosing the right representation might come down to personal preference or the requirements of the project.

Presentation is about visuals. The appropriate use of colour, interactive features (such as manipulating parameters, adjusting views, annotated details, animation), annotation (such as titles, introductions, user guides, labels, captions with narratives, visual annotations, legends and units, data sources and acknowledgements!), and arrangement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Trifecta Check-up (Q, D, V)

**In Notes**

A

Q. What is the question?

D. What does the data say?

V. What does the visual say?

All three questions should result in the same answer. Any discordance between two of these questions results in a poor visualisation.

  1. 6.1 (Q) What is the question? All data visualisations aim to answer a question using data. Without a question, a data visualisation doesn’t really have a point. Therefore, the Q in the trifecta sits at the top of the check-up. We use the question to evaluate the other two questions. The “How Popular is Your Birthday” visualisation answers a clear question, and many would agree (particularly those from the US), an interesting question. The answer to the question is likely to appeal to a wide audience and the data are sufficiently complex to be aided by a data visualisation. The ability to address this question with a data visualisation is a good objective.
  2. 6.2 (D) What does the data say? You can have a really good question, but fail to find the right data to answer the question. The D of the Trifecta check-up ask whether the data presented addresses Q. This often requires a designer to make many decisions about what data to use, how to clean, aggregate and transform data ready for visualisation. During the data stage, the decisions made by the designer will determine the success of the visualisation. The viewer must be able to connect the data with the question and be assured of its quality. If the data doesn’t connect with the question or questions are raised about the source or quality of the data, the data visualisation may fail the data question of the check-up. Looking back to the “How Popular is Your Birthday” example, the data visualisation includes annotations that reference the context of the data (U.S. daily birth rates from 1994 - 2014). We can read the source of the data was the U.S. Census data (generally very reliable), and also a note how birth rates were transformed to reflect the average between 1994-2014. The data also directly relates to the question.
  3. 6.3 (V) What does the visual say? You can have a question and good data, but unless you can visually communicate the answer using an effective visualisation method, a data visualisation may fail the V of the check-up. Again, there are many ways to visualise the same data, some will be excellent, some will be OK and some that will be plain wrong. The challenge for the designer will be to link an appropriate method with the type of data and the question being addressed. “How Popular is Your Birthday” does an excellent job of answering the V question. A heat map, with days of the month on the x axis and month on the y axis, presents a familiar, almost calendar-like, grid. A discrete colour scale is used to visualise the magnitude of the average birth-rate for a particular day. While colour scales lack visual accuracy, the viewer can still glean the high density of births between July and September. These correspond to conception times in cooler months of the year and during the Christmas holiday period. The interactive version of the plot has a hover-over effect where the viewer can read the actual average values. Overall, this data visualisation brings a whole new meaning to the Christmas holiday period in the U.S.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ethical Principles

A

Beneficence, Transparency, Accuracy, Objectivity, Respect, Accountability

  1. 8.1 Beneficence Your data visualisation must serve a valuable purpose by succinctly and accurately representing data in a way that leads to new knowledge and better decision making. Creating visualisation that don’t have a clear purpose, use unreliable data, misrepresent the truth, and deceive or confuse the viewer can be said to be maleficent. Maleficent data visualisations can be trivial data visualisations which waste peoples’ time or dangerous data visualisation that misrepresents the truth (e.g. fake news). It might be hard to think that a data visualisation can be immoral (Skau 2012), but the flaws of humanity never disappoint.
  2. 8.2 Transparency Correll (2019) referred to transparency as making the invisible visible. Many stages of data visualisation are not visible to the viewer. The viewer sees the “end product” and has little insight into data collection, data preprocessing, the numerous designs tested (or ignored) and the technology used to generate the visualisations (Correll 2019). This means the viewer places a high degree of trust in the designer. In order to earn this trust you should always document and be able to explain and justify these invisible steps. You need to prove you can be trusted. Sharing your data visualisation project code, datasets (assuming you have permission) and correctly attributing external data sources promotes transparency. This will allow others to verify your data and designs, reproduce your work, and produce alternate visualisations. Transparency and reproducibility are cornerstones of scientific research, and data visualisation should be no different. There are limitations to transparency. Depending on your data and topic, you might be restricted in what you can disclose. However, you should still keep this principle in mind because it will help you go back over your work at a later time and allow you to show technical details to others who have permission
  3. 8.3 Accuracy Accuracy refers to the the overall validity of our data visualisation. It relates to the quality of the data used, the rigour of data preprocessing and statistical analysis undertaken, and the method and choices used to represent the data. It also extends to the choice of variables visualised and how they relate to the objective of the visualisation. Your designs must be able to be verified and withstand critique. Where limitations are present, you raise the caveats and avoid overstating the findings.
  4. 8.4 Objectivity When designing data visualisations we need to be aware how our own personal expectations, biases and experiences can shape decisions and design. For example, using an easy to access dataset might bias data visualisations because the convenience of accessing the data means it leads to an over-representation of data visualisations using the same source. All data has some degree of bias or limitations, so data visualisation can propagate a bias. Objectively, you should match the best data needed to achieve your objective and minimise sources of bias such as convenience. You should also maintain an open, but sceptical mindset. Data visualisation can often uncover unexpected results and you need to be careful not suppress them because they don’t fit with our preconceived ideas. We also need to be equally careful to question and validate outcomes that fit with our expectations. Research has found that we are far less critical of facts that fit with our world view which is a phenomenon known as confirmation bias (see Nickerson 1998). Bias can creep in during all stages of our designs. For example, removing outliers because they ruin the appearance of a plot, removing subgroups of data because it doesn’t fit well with the story you want to tell, or failing to explain important context behind the data that will impact the viewer’s interpretation. Consider the following data visualisation of criminal offending in Victoria between 2010 and 2019 (Crime Statistics Agency 2019). Now assume you have just read a news article stating that Victorian crime is out of control, the usual rhetoric you get from politicians. If this fits with your experience and belief, you might not think critically about the following plot. Crime is clearly on the rise.
  5. 8.5 Respect When practicing data visualisation you need to respect your position of power, the rights of others and the law. We have already looked at examples of how unethical data visualisations can be used to misinform others in order to promote ideological and political agendas. Data visualisation has power because it can present very powerful ideas succinctly and accessibly. For example. studies have shown that the mere presence of a data visualisation can add instant credibility to information being presented about the efficacy of medication (Tal and Wansink 2016).You must respect that power and do your best not to abuse it, especially when your audience might lack the the knowledge and training to critically interpret a data visualisation. You must also be aware that other people may use your data visualisations in unintended and unethical ways. You must commit to respecting the rights of individuals and the law, especially privacy and copyright. Your designs must avoid bias towards others especially in respect to ethnicity, religion, gender, age, sexual orientation, or disability. This doesn’t mean to avoid these topics. In fact data visualisation is a powerful way to draw attention to many issues of discrimination (see the gender pay gap visualisation from Kommenda, Barr, and Holder (2018) in Figure 1.13 for an example). However, when dealing with sensitive topics, we need to be especially careful so as to avoid contributing to the problem.
  6. 8.6 Accountability You are accountable for your designs. You take credit where credit is due, and you are responsible when you make a mistake or do not achieve your objective. You strive to always improve and continue learning. When you are doing something outside your area of expertise or experience, you take steps to learn the required skills, seek supervision from someone qualified and get feedback from experts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Integrity (P, S, C, P, Si, Dq)

** In Notes **

A

Permission, Security, Consent, Privacy and Sensitive Information, Data Quality, 1.9.1 Permission The first thing you need is permission to access and use data for the purpose of visualisation. This might be simple to ascertain, for example, the data come from your workplace and you require it to complete your job. Sometimes it is not clear. For example, you do not automatically have permission to use data published on a website. Check the website’s policies or whether the data have a license for reuse. For example, many sites that publish data have a Creative Common’s License that will clearly outline how the data can be used and shared. If you cannot find any information on a licence for the published data, contact the site and seek written permission. Some sites ask you to submit requests for permission to access data sources. This allows the data owner to audit and control access. You might be asked questions about your identity, who you represent, what do you intend to do with the data, how you will store it, and who else will have access. Be truthful or you might risk violating a policy or license which can have legal implications. Sites will also sell data which effectively buys a license. It can be a bit of a minefield understanding licenses and permission. The important thing is that you take reasonable steps to verify permission before you start a visualisation. 1.9.2 Security Once you have your data, you are responsible for security if the license or conditions of use requires it. Again, this might be a simple task. For example, working on company data, using company computers and servers. However, can you copy the data to a portable drive and work on it at home? Maybe, maybe not. Get permission. What if you lose the portable drive or someone steals it? Are the data encrypted and password protected? Is your password secure? Accessing data remotely using databases is a more secure. However, what if your computer at home is compromised and your data are stolen? What if your computer hardware fails and the data are destroyed? Do you have a back-up? How long will you retain the data after a project is complete? Security is your responsibility. 1.9.3 Consent Informed consent is a complex ethical issue that relates to an individual’s voluntary permission to collect, use or disclose their personal data. Consent is needed prior to the collection of data. Consent must be informed. Informed consent is when the individual providing consent is fully aware of the purpose and risks associated with collection of their data and has the capacity to make an informed decision (National Health and Medical Research Council, Australian Research Council, and Universities Australia 2018). For example, obtaining informed consent from minors and people with cognitive impairments often requires consent from a guardian. Consent is still relevant for previously collected data used for a secondary purpose (i.e. a purpose that wasn’t explained to the individual when consent was first gained). If it is reasonable to assume that an individual would consent to the secondary use of data and the data are anonymous, consent can sometimes be assumed. The Cambridge Analytica scandal is a relevant case study here. Facebook requires users to sign a user agreement, which outlines how Facebook will collect, store and use user data. Users must agree to this before they can access the site. Facebook will claim that they use this data for operational purposes, for example finding friend connections, making content recommendations and targeted advertising (revenue). Most users consider this a reasonable way to use their data for access to Facebook’s powerful social media services. However, Cambridge Analytica harvested that data for a different purpose (political) without the user’s knowledge and Facebook sat on the information. Therefore, the use of Facebook data by Cambridge Analytica did not have informed consent, nor was it reasonable to assume that the users would ever consent to the use of their data in this way. 1.9.4 Privacy and Sensitive Information Most countries have privacy laws that aim to protect personal and sensitive information about individuals. In Australia, the Privacy Act (“Privacy Act” 1988) defines personal information as follows: …information or an opinion, whether true or not, and whether recorded in a material form or not, about an identified individual, or an individual who is reasonably identifiable. Examples of private information includes names, health records, phone numbers, finance information, and internet usage data etc. Anonymous data are not private. Companies often need to collect private information for the purpose running their business. For example, hospitals need medical histories to ensure their patients receive proper care. With the consent from an individual, this information can be collected, stored, used and disclosed in accordance with the Privacy Act. Companies that come under privacy legislation in Australia have to abide by a set of privacy principles that relate to transparency of data collection and management, the right to anonymity (if practical), use and disclosure of personal information, maintenance of data, data security and the right of an individual to correct information (Office of the Australian Information Commissioner 2019). Not all private information is equal. The Privacy Act 1988 has even more stringent rules about the use of sensitive information such as health, ethnic origin, political opinions, religious beliefs, sexual orientation, and criminal records. 1.9.5 Data Quality The quality of a data visualisation can only be as good as the data source. As Cairo (2014) explains: “Stories are sometimes built without assessing the quality of their sources or applying proper reporting and analysis methods. This can lead to disastrous results. (p. 26)” Take the time to locate and identify quality data sources. Here are some tips: Data taken from primary sources are more reliable than secondary sources. Primary sources are those that originally collected the data. If another individual or organisation republishes the data, it is a secondary source. Don’t be lazy. Track down the original source and confirm the data for yourself. Use reliable sources which can be trusted. Reliable sources have the following characteristics: Who collected the data (qualifications and authorisation) Provide clear details on how and when the data were collected including sampling Disclose potential conflicts of interest Use quality control processes Data have been collected in an ethical way A data dictionary has been provided to help users understand the dataset. Be wary of sample size. Use up to date data or data relevant to your problem. Use variables that have known reliability and validity. Check missing values. If there appears to be a lot, make sure you understand the reasons before using the data. Some degree of missing data are expected, but a quality data source will provide details.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Storytelling Strategies (G, A, Ds)

** In Notes **

A

Genre, Approach, Design Strategies,

Genre

Genre establishes the framing of the visualisation and how each element or story idea will appear or be presented to the viewer. For example, in an annotated chart, the narrative is contained within a single visualisation, while in a slide show, multiple frames are used to present text, visualisations and other supporting content to tell a more in-depth story. Determining which genre to use comes down to the story and situation.

Approach

Author driven approaches employ a linear ordering of ideas, extensive messaging (annotation, text, highlighting, etc.) and minimal interactivity on the user’s behalf. At the other end of the spectrum, reader-driven approaches have no clear ordering of information, minimal messaging and extensive interactivity (filtering, searching, changing views, etc.). Strict reader-driven approaches are not common because a user will risk missing the point, so, many narrative visualisations employ a hybrid-type approach by incorporating both author-driven and reader-driven elements.

Design strategies

The design of a narrative visualisation is broken into two major concepts, narrative structure and visual narrative. Narrative structure refers to design elements that determine how the narrative will unfold. This includes the ordering of elements (linear or user-directed), the degree of user interaction and control over the story and the use of messaging (annotations, introductory and summary statements).

The visual narrative refers to specific strategies used to elicit a narrative experience. This includes the following visual elements that direct a user’s attention to important points and transitions between frames as well as the user’s position within a story:

Annotation

Visual highlighting

Consistent visual platforms

Details on demand

Timeline Sliders

Tacit tutorial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly