Lecture 2: Data Foundations and Tasks Flashcards

1
Q

When should you not visualize?

A

When it comes to well-defined questions on a well-defined dataset

->use statistics/machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where do we get insight generation?

A

Humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pros of having a computer in the loop?

A

Scale
‣ Drawing by hand infeasible
‣ Interaction allows to ‘drill down’ into data
‣ Integration with algorithms
Efficiency
‣ Re-use charts for different datasets
Quality
‣ Precise data-driven rendering
Storytelling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use Interactivity?

A

Limitations of people and displays
Single static view can only show one aspect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name methods of Data Aquisition (Raw Data)

A

Measurements,
Modeling/Simulation,
Artificial

Measurements
‣ Real world data
‣ e.g.: computer tomography (CT) / magnetic resonance (MR), lab results, production sensor data
Modeling / Simulation
‣ e.g.: flow visualization, biological processes (pathways), climate change model, engine model
Artificial
‣ Human generated data
‣ e.g.: social networks, text, painting, movie, workflows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we handle missing data values?

A

Discard bad records,
Assign sentinel value
Assign average value
Assign value based on nearest neighbor
Communicate in visualization

Discard bad records
‣ Commonly applied
‣ Con: loss of data
Assign sentinel value
‣ e.g., -1, NaN
‣ Needs to be handled when statistics is applied
Assign average value
‣ Pro: effects statistics minimally
‣ Con: non existing data values are introduced
Assign value based on nearest neighbor
Communicate in visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is done during Data Processing & Cleaning(in order)?

A

Handling Missing Values, Normalization, Sanity Check, Data Reduction (Filtering, Aggregation), Data Transformation/Mapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Normalization?

A

Allows to compare seemingly unrelated data.
Transform data set so that results satisfy a particular statistical property .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is a Sanity Check important?

A

Impossible data values, Attention with (wrong) assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Reduction

What is Filtering?

A

Eliminating some items or attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Reduction

What are some Approaches of Data Filtering?

A

‣ User-defined attributes / criteria
* Clipping (min, max) * Threshold value (cut-off value)
* Interactive filtering/zooming
‣ Sampling
* e.g., take every xth element, random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Reduction

What is Data Aggregation?

A

Representing a group of items/attributes by a new item/attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Reduction

What are some Types of Data Aggregation?

A

Item aggregation
‣ Using statistics
e.g., average, min/max, count, sum
‣ Clustering
Attribute aggregation
‣ Dimensionality reduction aka embeddings / projections
e.g., t-SNE, PCA, UMAP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does Data Transformation / Mapping work?

A

In data space
‣ Convert from source data system to target data system
e.g., temperature conversion
In visual space
‣ Mapping of data to geometric primitives (points, lines, etc.) and their attributes (color, position, size, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name data types

Structural interpretation of data

A

Items
Attributes
Links
Positions
Grids

Different from data types in programming

Items
‣ Discrete individual entity
‣ e.g., machine, worker, city
Attributes
‣ Measured, observed, or logged properties of items
‣ Aka variable, dimension, feature e.g., age, price, temperature
Links
‣ Relationship between items
‣ e.g., Facebook friendship, connections between circuit elements
Positions
‣ Spatial data providing location in 2D or 3D space
‣ e.g., long/lat pair of city, pixel in photo, voxels in MRI scan
Grids
‣ Sampling strategy for continuous data 29
‣ e.g., grid of weather stations in a region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name dataset types

A

Tabels
Networks & Trees
Fields
Geometry
Clusters, Sets, Lists

Collection of information that is target of analysis

17
Q

Name Attribute Types

A

Categorical (nominal)
Ordered Ordinal
Ordered Quantitative

Which classes of values & measurements are there?

Categorical (nominal)
‣ Compare equality, no implicit order
‣ e.g., fruit, gender, product category, file types
Ordered
Ordinal
* Great/less than defined
* e.g., shirt size, rankings
Quantitative
* Arithmetic possible
* e.g., length, weight, count

18
Q

What are the types of ordering Directions?

A

Sequential
Diverging
Cyclic

Sequential
‣ Homogeneous from min to max
‣ e.g., # people in countries
Diverging
‣ Two or multiple sequences that meet at common zero point
‣ e.g., elevation dataset (above sea level & below sea level)
Cyclic
‣ Time (hours, week, month, year)
‣ e.g., seasons of the year

19
Q

What is Task Abstraction?

A

The formulation of domain-independent tasks.