All Flashcards
What is Exploratory Data Analysis?
Techniques for summarising, visualising and reviewing data; the first step to analysing data
What is tabular data?
Each record or observation represents a set of measurements of a single object or event
What are the four things you want visualisation to show?
Distribution, Relationship, Composition, Comparison
What is Distribution?
How a variable or variables in the dataset distribute over a range of possible values
What is Relationship?
How the values of multiple variables in the dataset relate
What is Composition?
How the dataset breaks down into subgroups
What is Comparison?
How trends in multiple variables or datasets compare
Why rescale a graph?
To increase visibility, and to find a ‘law’ (find a straight line)
What is Data Wrangling?
Exploring and transforming data to make valuable insights
What are the steps of Data Wrangling?
Obtain, Understand, Explore, Transform, Augment, Visualise
What are the types of missingness?
Missing Completely at Random, Missing at Random, Not Missing at Random
What is Missing Completely at Random?
The probability that the feature is missing is independent of the value of any other features
What is Missing at Random?
The probability that the feature is missing is independent of the feature but can be affected by the values of other features
What is Not Missing at Random?
The probability that the feature is missing can be dependent on the value of the feature
How can you deal with Missing Completely at Random?
Only use complete data
How can you deal with Missing at Random?
You can try to predict the values of the missing values. Deleting these values would be biased