Exam Flashcards

1
Q

What is Data Visualization

A

Use of human visual perception to help us communicate data analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the goals of DV?

A

Exploratory Analysis
- Starting point: We intend to discover new knowledge from the input data.
- Process: Explore the obtained visual representation and look for signs that could suggest indications of particular tendencies and relations.
- Results: Visualization of data that can form the basis of a hypothesis.

Confirmatory Analysis
- Starting point: Already have hypothesis and objective about the data.
- Process: Goal-oriented visual examination of the hypothesis.
- Results: Determine evidence for the acceptance or rejection of the pre-formulated hypothesis.

Presentation
- Starting point: Facts to be presented are fixed a priori in graphical display.
- Process: Choice of appropriate presentation techniques.
- Results: High-quality visualization of the facts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the take-aways for DV?

A

Perception
- Visual perception is subjective.
- How we represent our data is not independent of how others will understand it.
- Not all visual features are alike (e.g. color is different from length).

Dana Analytics
- We need to show our data to others for them to believe our findings.
- Just the results of some metric can hide strong bias and outliers.
- Visual vocabulary can be used to encode information in a qualitative way that makes it easier to detect patterns and bias.

Communication
- Representation of data through visual cues can be used in different tasks.
- Not all tasks have the same requirements.
- Context is important!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is context in DV?

A

Who
Identify your decision-maker and audience

What
Focus on the actions you expect from your audience and adapt your communication to the mechanism used

How
How your data will support your what

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the different levels of detail?

A

Trying to get funding (Investors/ Board) < Brainstorming new ideas for project (Colleagues) < Designing KPI dashboard (Colleagues)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Tufte’s Five Laws of Data-Ink?

A
  1. Above all else, show the data: focus on the data itself and presenting it clearly
  2. Maximize the data-ink ratio: maximize the proportion of ink (or pixels) used to represent the data compared to the total ink used in the graphic
  3. Erase non-data-ink: gridlines, background colors, and other elements that do not directly contribute to conveying the information should be minimized.
  4. Erase redundant data-ink: elements that repeat information already present in the data should be erased.
  5. Revise and edit: review and refine the visualizations to improve clarity and effectiveness.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the Gestalt Principles of Visual Perception?

A

Proximity: Objects that are close to each other are perceived as forming a group.

Similarity: Objects that are of similar color, shape and size are perceived as being part of the same group.

Enclosure: If elements are enclosed together we see them as part of the same group.

Closure: When presented with an incomplete or partially obscured image, people tend to mentally fill in the missing information to perceive the whole.

Continuity: Lines or patterns that follow a smooth, continuous flow are perceived as more related and are grouped together.

Connection: We identify objects that are physically connected as part of a group - which is stronger than similarity or enclosure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Figure and Ground?

A

Figures are perceived to be in the foreground, while Ground is whatever lies behind the figure.

The figure is distinguished from the background by Gestalt laws.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are Preattentive Attributes?

A

Visual properties of an object or stimulus that the human brain can detect and process rapidly, effortlessly, and in parallel, without the need for focused attention.

These attributes are processed in the early stages of visual perception, often before conscious awareness kicks in.

E.g. Orientation, shape, line length, line width, size, curvature, added marks, enclosure, hue, intensity, spatial position, motion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the meanings of colors?

A

Earthtones: Calming, sinks into the page.
Cool: Smoothing, restful, calm.
Unnatural Colors: Alarming, unnerving, draws attention.
Warm: Optimistic, active, vivid.

Increasing Color Intensity: Draws the eye and means the point is more important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Color-Blind Guide

A

Some people see colors in different ways.
Blue is the safest color.
Green/Red are not easy to distinguish.
We can use Blue/Orange or Blue/Red

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Models vs Conceptual Models

A

Data Models: formal descriptions using mathematical operations.

Conceptual Models: mental constructions that include semantics to support reasoning.

Examples:
1D Float Number vs Temperature
3D Vector of Float Numbers vs Spatial Location (Coordinates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different data types?

A

1. Qualitative (Categorical):

a) Nominal:
- No quantitative relationship between categories
- Classification without ordering
- Example: Gender, nationality, type of animal

b) Ordinal:
- Attributes can be rank-ordered
- Distances between values do not have any meaning
- Example: Education, health, customer satisfaction ratings

2. Quantitative (Numerical):
- Attributes can be rank-ordered
- Distances between values have a meaning
- Mathematical operations are possible
- Example: age, temperature, and salary

c) Discrete: Product of counting (e.g. heart rate, number of siblings)

d) Continuous: Can be measured with infinite values (e.g. height and weight)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Priorities Table in Relation to Perceivable Visual Attributes

A

From more to less perception:

Quantitative Data: Position, Length, Angle, Slope, Area, Volume, Density, Color Saturation, Color Hue, Texture

Ordinal Data: Position, Density, Color Saturation, Color Hue, Texture, Connections, Containment, Lenght, Angle, Slope, Area, Volume

Nominal Data: Position, Color Hue, Texture, Connections, Containment, Density, Color Saturation, Shape, Length, Angle, Slope, Area, Volume

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are dashboards?

A

Visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so that the information can be monitored at a glance.

An information display designed for people to help maintain situational awareness.

Set of interactive charts (primarily graphs and tables) that simultaneously reside on a single screen, each of which presents a somewhat different view of a common dataset and is used to analyze that information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Support Attributes that should be present in a Dashboard

A

Information is presented using small, concise, direct, & clear display of media
- Clearly stated messages
- Each point should be limited to the space needed

Customized
- Tailored to the needs of a specific group or individual

Consistent layout
- Data changes over time
- Interface is consistent (until it’s time for the next revision)

17
Q

Dashboard Layout - Emphasis

A

Top left - Most emphasis
Top right & Bottom left - Neutral
Least emphasis - Least emphasis

18
Q

Dashboard “Do’s”

A

Content position and size should match its importance and frequency of use.
Use color and formatting to draw attention where needed, rather than to decorate
Visually associate data and content that is related
Use the needs of the user to drive the layout, rather than forcing layout with an inflexible grid.
When deciding placement, consider how the eye will scan the page

19
Q

Why use Storytelling?

A

Stories solidify abstract concepts and simplify complex messages.
Stories are a universal language.
Stories inspire and motivate.
They help to focus the attention on the audience.

20
Q

Storytelling Structure

A

Stories must follow a clear pattern so that the audience recognizes it as a story.
The best stories are snapshots of a world improved by an action we want to promote.
You need to set up a story to show:

1. A problem to be solved, and a character that is affected by the problem.
Example: Your client wants to expand his gourmet street food truck chain to a new location in Europe.

2. A way to solve the problem.
Example: By focusing in the markets with fewer medium high/high end restaurants, he could tap into that market at a lower cost.

3. The impact of solving the problem.
Example: He would increase revenues significantly.

21
Q

Chronological vs Lead with the Ending

A

After creating a story, you can guide your audience through its natural order (chronological) or by leading with the ending (better when the audience is already on your side).

22
Q

Stories to guide the project development

A

1. Formulate a plausible hypothesis.
Markets with a smaller percentage of expensive restaurants have better opportunities for my gourmet food truck

2. Define the key analysis that supports/disproves this hypothesis.
Identify examples of these markets. Identify the rating of smaller business with food types similar to yours. If your hypothesis is correct – you should have better rating in these markets

3. Do those analyses and only those.
Double check your reasoning. Get one or two insights that can lead to one our two new analyses or hypotheses (the fact that ratings are biased for those with fewer reviews, for example).

4. Stop! Finish writing your story and prepare your document.
Even if you disprove your hypothesis, that is a good insight and if the hypothesis is well formulated, you can still find a solution to the problem

23
Q

When presenting your work…

A

Use the titles to tell your story: Just by reading the slides you should be able to understand the message. Each title should be informative and affirmative.

Use white space to support the claim in your title: With graphs, diagrams, even text. Each slide should have only the content that supports the title.

24
Q

Design Approaches (4 A’s)

A

Design approaches (supported in 4 A’s) help us improve our communication.

Form (how can you best visualize with ease) forms function (what do you want to enable your audience to do with your data).

Affordances: Aspects of the design that make it obvious how to use.

Accessibilitiy: Design that is usable by people of widely varying technical skills.

Aesthetics: More visually appealing designs are perceived as easier to use and are more readily accepted.

Acceptance: For your design to be effective, it must be accepted by the intended audience.

25
Q

Visualizations To Be Avoided

A

Pie Charts

Pie charts are not a recommended visualization, because it may incur the wrong cognitive understandings from human perception.

  • Human brains have difficulty in comparing the size of angels and reading accurate values without scale.
  • Problems are exacerbated when making 3D pie charts.

Exceptions: Only use them for percentage breakdowns

  • Each slice represents a certain percentage out of 100%.
  • Order the slices in size to make it easier to read.
  • Never use a pie chart if it has more than 5 slices.
  • NEVER make it 3D.

Bar Chart Mistakes

The length of the bar should proportionally represent the magnitude of values.
Bar chart axes should include zero.
Do not use stacked bar when it should be grouped.
Do not compare the bars between different data.
Grouped bar chart when you want to compare the values in sub-categories.

Line Chart Mistakes

Do not use line charts if the x value does not change or belong to different categories.
Do not use line charts to represent y-values at different scales (e.g. you may consider two y-axes, but with caution).

26
Q

Advanced Representations

A

Graph: Structure amounting to a set of objects in which some pairs of objects are in some sense “related”.

Vertices or Nodes: Represent the objects. They can represent different things, like people, companies, websites…

Edges: Represent the relation, and connect two vertices. They can represent family or friendship ties, commercial relations, links between sites…

27
Q

What is Community Detection?

A

Find the natural division of a network into groups of nodes such that there are many edges within the groups and few edges between them. Equivalent to clustering in usual datasets. In this graph, represented by colors.

28
Q

What is Connectivity?

A

Property of both the whole graph and individual nodes. Highly connected graphs, where each node is related to several other nodes, have fewer communities and diffusion events propagate faster. Individual nodes with more relations are hubs for diffusion and are most often influential in the network.

29
Q

What is the Fruchterman-Reingold Algorithm?

A

Graph layout algorithm that aims to arrange the nodes (or vertices) of a graph in a visually pleasing way.

Picture each pair of connected nodes as if they are connected by springs.

  • The springs try to pull connected nodes closer together.
  • Nodes that are not directly connected exert a repulsive force on each other. This force pushes unconnected nodes apart.
  • The algorithm balances these attractive and repulsive forces, creating a layout where connected nodes are close, and unconnected nodes are spread out.
30
Q

Hierarchy

A

Hierarchical structure is most appropriate when, moving from level to level, the same kind of relationship is depicted, such as “parent-child” or “genus-species”. Often represented as a tree-like hierarchy.

31
Q

Radial Tree

A

Type of tree diagram used in data visualization to display hierarchical structures. In a radial tree, the root node is placed at the center, and its child nodes radiate outward in a circular or radial pattern.

32
Q

Sankey Diagram

A

Depicts the flow of resources or information between multiple entities. It is particularly useful for illustrating the distribution, transfer, or transformation of quantities within a system. The key feature of a Sankey diagram is its use of proportional arrow or ribbon widths to represent the quantity of flow between different stages or categories.

33
Q

Principles of Effective Visual Communication

A
  • Clarity: Make the main message easily understandable.
  • Simplicity: Focus on essential information and remove unnecessary elements.
  • Hierarchy: Establish a visual hierarchy to guide the viewer’s attention.
  • Consistency: Maintain a consistent visual style throughout the visualization.
  • Accessibility: Design for a diverse audience, including people with disabilities.
  • Honesty: Present data truthfully and avoid misleading representations.
34
Q

Jacques Bertin’s Visual Variables

A

Mark: Something that is visible and can be used to show relationships within sets of data.

Visual Variables: Basic building blocks with certain aesthetic characteristics for visual mapping. Choosing different visual variables for representing different aspects of the same information can greatly influence the perception and understanding of the presented information.

Bertin’s Visual Variables:
- Position
- Size
- Shape
- Value
- Hue
- Orientation
- Texture

Selective: Is X different from the others?
Associative: Is X like the others?
Quantitative: How much is the difference between X and Y?
Order: Is X more/greater/bigger/… than Y?
Length: How many different categories can we represent with this variable for a task?

35
Q

Data Analysis Flow in Tableau

A

Data check
Explore data
Analyze and visualize data
Dashboarding
Communicate insights

36
Q

Dimensions and Variables

A

Dimensions: Contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal the details in your data. Dimensions affect the level of detail in the view

Measures: Contain numeric, quantitative values that you can measure. Measures can be aggregated. When you drag a measure into the view, Tableau applies an aggregation to that measure (by default).

37
Q

Continuous and Discrete

A

Green measures and dimensions are continuous. Continuous field values are treated as an infinite range. Generally, continuous fields add axes to the view.

Blue measures and dimensions are discrete. Discrete values are treated as finite. Generally, discrete fields add headers to the view.

38
Q

Exploratory Data Analysis (EDA)

A

Used to understand the characteristics of a variable and its hidden relationship with other variables in the dataset.

This process makes observations about data, summarizes it. All types of data can be considered, such as categorical, continuous, string, etc.

It can involve univariate, bivariate or multivariate analysis.

EDA is mainly used to:
- Improve understanding of data
- Understanding the importance of variables
- Identify the outliers/missing values
- Formulating hypothesis
- Help prepare the data for modeling

39
Q

Common Steps in EDA

A
  1. Handle missing values
  2. Structure the data
  3. Identify the trends of features (data types, univariate analysis, bivariate analysis)
  4. Correlation analysis
  5. Outlier detection