All Flashcards by Grimson Action

What is Exploratory Data Analysis?

Techniques for summarising, visualising and reviewing data; the first step to analysing data

How well did you know this?

Not at all

Perfectly

What is tabular data?

Each record or observation represents a set of measurements of a single object or event

How well did you know this?

Not at all

Perfectly

What are the four things you want visualisation to show?

Distribution, Relationship, Composition, Comparison

How well did you know this?

Not at all

Perfectly

What is Distribution?

How a variable or variables in the dataset distribute over a range of possible values

How well did you know this?

Not at all

Perfectly

What is Relationship?

How the values of multiple variables in the dataset relate

How well did you know this?

Not at all

Perfectly

What is Composition?

How the dataset breaks down into subgroups

How well did you know this?

Not at all

Perfectly

What is Comparison?

How trends in multiple variables or datasets compare

How well did you know this?

Not at all

Perfectly

Why rescale a graph?

To increase visibility, and to find a ‘law’ (find a straight line)

How well did you know this?

Not at all

Perfectly

What is Data Wrangling?

Exploring and transforming data to make valuable insights

How well did you know this?

Not at all

Perfectly

What are the steps of Data Wrangling?

Obtain, Understand, Explore, Transform, Augment, Visualise

How well did you know this?

Not at all

Perfectly

What are the types of missingness?

Missing Completely at Random, Missing at Random, Not Missing at Random

How well did you know this?

Not at all

Perfectly

What is Missing Completely at Random?

The probability that the feature is missing is independent of the value of any other features

How well did you know this?

Not at all

Perfectly

What is Missing at Random?

The probability that the feature is missing is independent of the feature but can be affected by the values of other features

How well did you know this?

Not at all

Perfectly

What is Not Missing at Random?

The probability that the feature is missing can be dependent on the value of the feature

How well did you know this?

Not at all

Perfectly

How can you deal with Missing Completely at Random?

Only use complete data

How well did you know this?

Not at all

Perfectly

How can you deal with Missing at Random?

You can try to predict the values of the missing values. Deleting these values would be biased

How well did you know this?

Not at all

Perfectly

How can you deal with Not Missing at Random?

You cannot do much against this

What is faceting?

Apply the same analysis to many comparable subsets of data, then put them side by side

What are the two kinds of variables and how are they classified?

Identifier variables are the variables that we set up. Measurement variables are the variables we measure

What is a general strategy for working with larger data sets?

Split the problem into smaller pieces, work on each piece individually, recombine the pieces. This is known as split-apply-combine

What is Visual Encoding?

The way in which data is mapped into visual structures

What is Visual Perception?

The ability to interpret the surrounding environment by processing information

What three things should you consider when encoding?

Importance, Expressiveness and Consistency

What is Pre-Attentive Processing?

The subconscious accumulation of information from the environment

What are the four types of data?

Nominal, Ordinal, Discrete, Continuous

What is Nominal data?

Named Categories

What is Ordinal data?

Categories with an implied order

What is Discrete data?

Only particular numbers

What is Continuous data?

Any numerical value

What is a Contingency Table?

A table of counts for cases, rows and columns are labelled with categorical variables, while the cell values are counts

What are the two types of study?

Observational study, using existing observations and plentiful data, versus experimental study, where you create a specific experiment to gather the data for the study

What are Confounders?

A confounder is a variable that causes changes in both the identifier and measurement variables

Give the steps for hypothesis testing

Specify the Null (H0) and Alternate (H1) hypothesis, assume the null hypothesis is true and use the data, calculate the p-value, if it is high assume null hypothesis, else assume alternate hypothesis

What are the two types of error?

Type 1 error, H0 is rejected when in reality it is true, or type 2 error, H0 is not rejected when in reality it is false

What is the p value?

P-value is the lowest value at which the null hypothesis is rejected

What is A/B Testing?

Randomized controlled trials used by companies

What is Dimensionality?

The number of measurements available for each example in a dataset

What is a Multivariate visualisation?

Visualisation of datasets that have more than three variables

Why reduce dimensionality?

To reduce strain on computers and allow use by humans

What are the two types of reducing dimensionality in a non linear way?

Global method assumes that all pairwise distances are of equal importance, while the local method assumes that only the local distances are reliable

What are the two types of reducing dimensionality in a linear way?

PCA (Principal Components Analysis) finds the directions that have the most variance, while MDS (Multi-Dimensional Scaling) arranges the points to minimise discrepancy