All Flashcards
What is Exploratory Data Analysis?
Techniques for summarising, visualising and reviewing data; the first step to analysing data
What is tabular data?
Each record or observation represents a set of measurements of a single object or event
What are the four things you want visualisation to show?
Distribution, Relationship, Composition, Comparison
What is Distribution?
How a variable or variables in the dataset distribute over a range of possible values
What is Relationship?
How the values of multiple variables in the dataset relate
What is Composition?
How the dataset breaks down into subgroups
What is Comparison?
How trends in multiple variables or datasets compare
Why rescale a graph?
To increase visibility, and to find a ‘law’ (find a straight line)
What is Data Wrangling?
Exploring and transforming data to make valuable insights
What are the steps of Data Wrangling?
Obtain, Understand, Explore, Transform, Augment, Visualise
What are the types of missingness?
Missing Completely at Random, Missing at Random, Not Missing at Random
What is Missing Completely at Random?
The probability that the feature is missing is independent of the value of any other features
What is Missing at Random?
The probability that the feature is missing is independent of the feature but can be affected by the values of other features
What is Not Missing at Random?
The probability that the feature is missing can be dependent on the value of the feature
How can you deal with Missing Completely at Random?
Only use complete data
How can you deal with Missing at Random?
You can try to predict the values of the missing values. Deleting these values would be biased
How can you deal with Not Missing at Random?
You cannot do much against this
What is faceting?
Apply the same analysis to many comparable subsets of data, then put them side by side
What are the two kinds of variables and how are they classified?
Identifier variables are the variables that we set up. Measurement variables are the variables we measure
What is a general strategy for working with larger data sets?
Split the problem into smaller pieces, work on each piece individually, recombine the pieces. This is known as split-apply-combine
What is Visual Encoding?
The way in which data is mapped into visual structures
What is Visual Perception?
The ability to interpret the surrounding environment by processing information
What three things should you consider when encoding?
Importance, Expressiveness and Consistency
What is Pre-Attentive Processing?
The subconscious accumulation of information from the environment