All Flashcards

1
Q

What is Exploratory Data Analysis?

A

Techniques for summarising, visualising and reviewing data; the first step to analysing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is tabular data?

A

Each record or observation represents a set of measurements of a single object or event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four things you want visualisation to show?

A

Distribution, Relationship, Composition, Comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Distribution?

A

How a variable or variables in the dataset distribute over a range of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Relationship?

A

How the values of multiple variables in the dataset relate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Composition?

A

How the dataset breaks down into subgroups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Comparison?

A

How trends in multiple variables or datasets compare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why rescale a graph?

A

To increase visibility, and to find a ‘law’ (find a straight line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Data Wrangling?

A

Exploring and transforming data to make valuable insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps of Data Wrangling?

A

Obtain, Understand, Explore, Transform, Augment, Visualise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the types of missingness?

A

Missing Completely at Random, Missing at Random, Not Missing at Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Missing Completely at Random?

A

The probability that the feature is missing is independent of the value of any other features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Missing at Random?

A

The probability that the feature is missing is independent of the feature but can be affected by the values of other features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Not Missing at Random?

A

The probability that the feature is missing can be dependent on the value of the feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you deal with Missing Completely at Random?

A

Only use complete data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you deal with Missing at Random?

A

You can try to predict the values of the missing values. Deleting these values would be biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you deal with Not Missing at Random?

A

You cannot do much against this

18
Q

What is faceting?

A

Apply the same analysis to many comparable subsets of data, then put them side by side

19
Q

What are the two kinds of variables and how are they classified?

A

Identifier variables are the variables that we set up. Measurement variables are the variables we measure

20
Q

What is a general strategy for working with larger data sets?

A

Split the problem into smaller pieces, work on each piece individually, recombine the pieces. This is known as split-apply-combine

21
Q

What is Visual Encoding?

A

The way in which data is mapped into visual structures

22
Q

What is Visual Perception?

A

The ability to interpret the surrounding environment by processing information

23
Q

What three things should you consider when encoding?

A

Importance, Expressiveness and Consistency

24
Q

What is Pre-Attentive Processing?

A

The subconscious accumulation of information from the environment

25
What are the four types of data?
Nominal, Ordinal, Discrete, Continuous
26
What is Nominal data?
Named Categories
27
What is Ordinal data?
Categories with an implied order
28
What is Discrete data?
Only particular numbers
29
What is Continuous data?
Any numerical value
30
What is a Contingency Table?
A table of counts for cases, rows and columns are labelled with categorical variables, while the cell values are counts
31
What are the two types of study?
Observational study, using existing observations and plentiful data, versus experimental study, where you create a specific experiment to gather the data for the study
32
What are Confounders?
A confounder is a variable that causes changes in both the identifier and measurement variables
33
Give the steps for hypothesis testing
Specify the Null (H0) and Alternate (H1) hypothesis, assume the null hypothesis is true and use the data, calculate the p-value, if it is high assume null hypothesis, else assume alternate hypothesis
34
What are the two types of error?
Type 1 error, H0 is rejected when in reality it is true, or type 2 error, H0 is not rejected when in reality it is false
35
What is the p value?
P-value is the lowest value at which the null hypothesis is rejected
36
What is A/B Testing?
Randomized controlled trials used by companies
37
What is Dimensionality?
The number of measurements available for each example in a dataset
38
What is a Multivariate visualisation?
Visualisation of datasets that have more than three variables
39
Why reduce dimensionality?
To reduce strain on computers and allow use by humans
40
What are the two types of reducing dimensionality in a non linear way?
Global method assumes that all pairwise distances are of equal importance, while the local method assumes that only the local distances are reliable
41
What are the two types of reducing dimensionality in a linear way?
PCA (Principal Components Analysis) finds the directions that have the most variance, while MDS (Multi-Dimensional Scaling) arranges the points to minimise discrepancy