All Flashcards

1
Q

What is Exploratory Data Analysis?

A

Techniques for summarising, visualising and reviewing data; the first step to analysing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is tabular data?

A

Each record or observation represents a set of measurements of a single object or event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four things you want visualisation to show?

A

Distribution, Relationship, Composition, Comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Distribution?

A

How a variable or variables in the dataset distribute over a range of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Relationship?

A

How the values of multiple variables in the dataset relate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Composition?

A

How the dataset breaks down into subgroups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Comparison?

A

How trends in multiple variables or datasets compare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why rescale a graph?

A

To increase visibility, and to find a ‘law’ (find a straight line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Data Wrangling?

A

Exploring and transforming data to make valuable insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps of Data Wrangling?

A

Obtain, Understand, Explore, Transform, Augment, Visualise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the types of missingness?

A

Missing Completely at Random, Missing at Random, Not Missing at Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Missing Completely at Random?

A

The probability that the feature is missing is independent of the value of any other features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Missing at Random?

A

The probability that the feature is missing is independent of the feature but can be affected by the values of other features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Not Missing at Random?

A

The probability that the feature is missing can be dependent on the value of the feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you deal with Missing Completely at Random?

A

Only use complete data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you deal with Missing at Random?

A

You can try to predict the values of the missing values. Deleting these values would be biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you deal with Not Missing at Random?

A

You cannot do much against this

18
Q

What is faceting?

A

Apply the same analysis to many comparable subsets of data, then put them side by side

19
Q

What are the two kinds of variables and how are they classified?

A

Identifier variables are the variables that we set up. Measurement variables are the variables we measure

20
Q

What is a general strategy for working with larger data sets?

A

Split the problem into smaller pieces, work on each piece individually, recombine the pieces. This is known as split-apply-combine

21
Q

What is Visual Encoding?

A

The way in which data is mapped into visual structures

22
Q

What is Visual Perception?

A

The ability to interpret the surrounding environment by processing information

23
Q

What three things should you consider when encoding?

A

Importance, Expressiveness and Consistency

24
Q

What is Pre-Attentive Processing?

A

The subconscious accumulation of information from the environment

25
Q

What are the four types of data?

A

Nominal, Ordinal, Discrete, Continuous

26
Q

What is Nominal data?

A

Named Categories

27
Q

What is Ordinal data?

A

Categories with an implied order

28
Q

What is Discrete data?

A

Only particular numbers

29
Q

What is Continuous data?

A

Any numerical value

30
Q

What is a Contingency Table?

A

A table of counts for cases, rows and columns are labelled with categorical variables, while the cell values are counts

31
Q

What are the two types of study?

A

Observational study, using existing observations and plentiful data, versus experimental study, where you create a specific experiment to gather the data for the study

32
Q

What are Confounders?

A

A confounder is a variable that causes changes in both the identifier and measurement variables

33
Q

Give the steps for hypothesis testing

A

Specify the Null (H0) and Alternate (H1) hypothesis, assume the null hypothesis is true and use the data, calculate the p-value, if it is high assume null hypothesis, else assume alternate hypothesis

34
Q

What are the two types of error?

A

Type 1 error, H0 is rejected when in reality it is true, or type 2 error, H0 is not rejected when in reality it is false

35
Q

What is the p value?

A

P-value is the lowest value at which the null hypothesis is rejected

36
Q

What is A/B Testing?

A

Randomized controlled trials used by companies

37
Q

What is Dimensionality?

A

The number of measurements available for each example in a dataset

38
Q

What is a Multivariate visualisation?

A

Visualisation of datasets that have more than three variables

39
Q

Why reduce dimensionality?

A

To reduce strain on computers and allow use by humans

40
Q

What are the two types of reducing dimensionality in a non linear way?

A

Global method assumes that all pairwise distances are of equal importance, while the local method assumes that only the local distances are reliable

41
Q

What are the two types of reducing dimensionality in a linear way?

A

PCA (Principal Components Analysis) finds the directions that have the most variance, while MDS (Multi-Dimensional Scaling) arranges the points to minimise discrepancy