Chapter 2 Flashcards

1
Q

What does explanatory data analysis do and how?

A

Takes the available information and analyse it to summarise the whole data set.

It uses descriptive statistical techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does EDA compare to data mining?

A

EDA uses descriptive statistical techniques.

Whereas data mining uses descriptive and inferential methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose of EDA?

A

To describe the structure and the relationships present in the data, for eventual use in a statistical model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is univariate exploratory data analysis?

A

Analysis of individuals.

It is an important step in preliminary data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does univariate data analysis usually consist of?

A

Graphical displays and a series of summary indexes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What kind of graphical univariate analysis is carried out for qualitative nominal data?

A

Bar charts and pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What kind of graphical univariate analysis is carried out for ordinal qualitative and discrete quantitative variables?

A

Frequency diagrams (ie bar charts where order on the horizontal axis is important)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the main unidimensional / univariate statistical indexes?

A
  • Measures of location
  • Measures of variability
  • Measures of heterogeneity
  • Measures of centrality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the measures of location?

A

Mean, mode and median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for the mean?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is univariate data classified?

A

In terms of a frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the measures of variability?

A

The range, interquartile range and variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the range?

A

The difference between the maximum and minimum observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the interquartile range?

A

The difference between the third and first quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the formula for the sample variance?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What types of data can measures of location and measures of variability be used for?

A

Only for continuous/quantitative data.

They cannot be used for qualitative data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can the dispersion of qualitative data be calculated?

A

Measures of heterogeneity - the spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is null heterogeneity?

A

When all observations have X equal to the same level (ie they all belong to the same category, so there is no spread).

Pi = 1 for a certain I
Pi = 0 for all other k-1 levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is maximum heterogeneity?

A

When all observations are uniformly distributed among the k levels.

Pi = 1/k for all i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can you asses the dispersion of qualitative data?

A
  • Gini index of heterogeneity
  • Entropy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the formula for the Gini index of heterogeneity?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does G = 0 represent?

A

Perfect homogeneity ie everything belongs to the same category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What G = 1 - 1/K or (k-1)/k represent?

A

Maximum heterogeneity - all categories represented equally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do we normalise the Gini index of heterogeneity?

A

By dividing by the maximum value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What values does the normalised Gini index take?

A

0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the formula for the normalised Gini index?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the formula for Entropy?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does E = 0 represent?

A

Perfect homogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does E = log K represent?

A

Maximum heterogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the normalised index called?

A

The relative index of heterogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do you obtain the normalised index E

A

Rescale by the maximum value (log K)

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are the measures of concentration?

A

The Gini Coefficient R (a summary index of concentration)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are measures of concentration?

A

They help understand the concentration of the characteristic among the N quantities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the two extreme situations which can occur for measures of concentration?

A

Minimum concentration - equal income for all (everyone has equal salary) x1 = x2 = …. = xn = x

Maximum concentration - someone gets all the income x1 = x2 = … xn-1 = 0 and xn = N*x_bar

The degree of concentration can lie between these two extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the equation for the Gini Coefficient R and the relevant steps associated with it?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the conditions for the variables investigated for the Gini Coefficient R?

A

There are N non-negative quantitates measuring a transferable characteristic (eg a fixed amount of income among N individuals) placed in an increasing (non-decreasing) number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is Fi?

A

The cumulative proportion of considered units, up to unit i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is Qi?

A

The cumulative proportion of characteristic that belongs to the first I units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What do you need to remember about the sums associated with the Gini concentration index R?

A

They sum up to N-1, don’t include the final value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What does R = 0 represent?

A

Minimum concentration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What does R = 1 represent?

A

Maximum concentration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is a challenge of multivariate data?

A

The sheer complexity of the information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are the methodologies for exploring and simplifying complex multivariate data?

A
  • Principal components
  • Exploratory factor analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is principal components analysis (PCA)?

A

A data-reduction technique that transforms a larger number of correlated variables into a much smaller set of uncorrelated variables called principal components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is exploratory factor analysis (EFA)?

A

A collection of methods designed to uncover the latent structure in a given set of variables.

It looks for a smaller set of underlying or latent constructs that can explain the relationships among the observed variables.

eg a dataset of 24 variables has intercorrelations that can be explained by 4 underlying factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What are principal components?

A

Uncorrelated composite variables, used to reduce dimensionality.

They aim to ratio as much information from the original set of variables as possible.

They are linear combinations of the observed variables. The weights used to form the linear composites are chosen to maximise the variance each PC accounts for, while keeping the components uncorrelated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What are factors?

A

Factors are assumed to underlie or “cause” the observed variables in exploratory factor analysis, rather than being linear combinations of them.

Errors represent the variance in the observed variables unexplained by the factors.

The factors and errors aren’t directly observable but are inferred from the correlations among the variables.

Curved arrows between factors indicate that they are correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Describe the process of principal component analysis

A

A statistical technique that linearly transforms an original set of p correlated variables into a new set of k uncorrelated variables called principal components. These are a substantially smaller set of variables that represent most of the information in the original set - they maximise the variance accounted for in the original p variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What order are the principal components derived in?

A

Decreasing order of importance so that the 1st PC accounts for as much as possible of the variation in the original data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is the objective of PCA?

A

To see if the first few components account for most of the variation in the original data. If they do, then it is argued that the effective dimensionality of the problem is less than p (the original number of correlated variables).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What are the goals of PCA?

A

Reduce the dimensionality of the original data set

A smaller set of uncorrelated variables is much easier to understand and use in further analysis than a larger set of correlated variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What does reducing the dimensionality of the problem do?

A

Simplifies the complexity of the data.

Makes it easier to visualise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What do PCs reveal about the structure of the data?

A

Principal components are the underlying structure in the data.

They are the directions where there is the most variance, the directions where the data is most spread out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

How do we use mathematics to find the principal components?

A

Using eigenvectors and eigenvalues - we can destruct the set of data points into eigenvectors and eigenvalues.

55
Q

What is the relationship of eigenvectors to each other?

A

Orthogonal to each other.

The eigenvectors have to be able to span the whole [x-y] area. In order to do this most effectively, the directions need to be orthogonal.

The eigenvectors then provide a much more useful axis to frame the date in.

56
Q

What do eigenvectors and eigenvalues represent?

A

Eigenvector - direction
Eigenvalue - number, telling us how much variance there is in the data in that direction. Telling us how spread out the data is on the line.

57
Q

Which eigenvectors is the principal component?

A

The eigenvectors with the highest eigenvalues

58
Q

How many eigenvectors/values exit?

A

The same number of dimensions that the data set has.

The eigenvectors put the data into a new set of dimensions, so these new dimensions have to be equal to the original amount of dimensions.

59
Q

Does investigating eigenvectors change the data?

A

No, we are just looking at it from a different angle.

We are shifting from one set of axes to another. We rearrange the axes to be along the eigenvectors.

These new axes are much more intuitive to the shape of the data.

60
Q

How do the eigenvectors compare to the original axes?

A

They are more intuitive to the shape oof the data.

However, the original axis were well defined (we explicitly measure these things), and the new axes are not.

There is often a good reason why the new axes represent the data better, but the maths won’t tell us why. Data scientists have to work out the meanings of the new axes.

61
Q

What is the main way PCA and eigenvectors help in the analysis of data?

A

Dimensionality reduction.

62
Q

What does dimension reduction do?

A

Reduces the data down into its basic components, stripping away any unnecessary parts.

63
Q

What matrix do we use to carry out PCA?

A

The correlation matrix, R

This is the variance - covariance matrix of standardised variables.

We have to standardised the matrix of data X (with n rows and p columns) to give matrix Z so that each column has variance 1 and mean 0.

64
Q

What is the first principal component of the data matrix Z described by?

A

It is a vector described by a linear combination of the variables.

In matrix terms: Y1 = Z * a1

Z - the original standardised matrix
A1 - the vector of coefficients (weights)

65
Q

How are the weights chosen?

A

Chosen to maximise the variance of the variable Y1.

Y1 is maximised when the weights are chosen to be the eigenvectors corresponding to the largest eigenvalues of the correlation matrix.

66
Q

How is the 1st PC determined?

A

By the vector of weights a1 such that the variance of Y1 is maximised, under the constraint a1 ‘ a1 = 1

(a1 to itself = 1)

67
Q

How is the 2nd PC determined?

A

Y2 = Za2

where the vector of coefficients is chosen in such a way that the variance of Y2 is maximised, under the constraints a2 ‘ a2 = 1 and a2 ‘ a1 = 0 (they are perpendicular)

It can be shown that a2 is the eigenvector (normalised and orthogonal to a1) corresponding to the second largest eigenvalues of R.

68
Q

What is the general formula for the vth principal component for v = 1, … , k?

A

It is the linear combination

Yv = Zav

In which the vector of coefficients av is the eigenvectors of R corresponding to the vth largest eigenvalues.
This eigenvectors is normalised and orthogonal to all the previous eigenvectors.

69
Q

What is the variance of the vth principal component?

A

The vth eigenvalues
Var(Yv) = lambda v

70
Q

What does the covariance between the principal components satisfy?

A

Cov(Yi, Yj) = 0

Should be 0 since they are perpendicular

71
Q

How can you describe the variance-covariance matrix of Y?

A

A diagonal matrix - where the lambdas 1-K appear along the diagonal and all other values are 0.

72
Q

What is the proportion of variability that is maintained in the transformation from the original p variables to k <= p components?

A

(1/p) * sum(lamda i)

eg 10 original variables to 3 PCs
Proportion of variability maintained by 3PCs is L1 + L2 + L3 / 10

This equation expresses a cumulative measure of the quota of variability “reproduced” by the first k components, with respect to the overall variability present in the original data matrix.

Therefore it can be a measure of importance of the chosen k PCs, in terms of “quantity of information” maintained by passing p variables to k components.

73
Q

How do we calculate the relative importance of each PC (with respect to the original variables)?

A

Consider the general covariance between a PC and the original standardised variables Z.

This helps us interpret what the new PCs mean - if PC1 is highly correlated with age, we know it tells us a lot about age.

74
Q

What is the linear correlation between a PC (Yj) and the original standardised variable (Zi)?

A

Corr(Yj, Zi) = sqrt(lambda-j) * a-ij

The linear correlation between PC Yj and the original variable Xi
Corr(Yj, Xi) = Corr(Yj, Zi) = sqrt(lambda-j) * a-ij

75
Q

What is the loading?

A

The algebraic sign and value of the coefficient a-ij.

This determines the sign and strength of the correlation between the jth PC and the ith original variable.

76
Q

What is the formula for the portion of variability of an original variable Xi explained by k principle components?

A

Portion of variability = sum (lambda-j * a-ij^2)

The portion of variability is the sum of the square of the appropriate correlation terms.

This describes the quota of variability of each explanatory variable that is maintained in passing from the original variables to the principal components.

77
Q

What does knowing the portion of variability enable?

A

We can interpret each PC by referring it mainly to the variables with which it is strongly correlated.

78
Q

If all correlations are high between variables (as displayed in a correlation matrix), what might we suspect?

A

That a few principal components would explain most of the variation in the original variables.

79
Q

What should all the eigenvalues add up to?

A

The total number of original values - the information of the original values has been redistributed.

80
Q

How do you calculate the percentage of the original variance that a principal component accounts for?

A

Take the eigenvalues for that PC and divide it by the total number of original variables.

81
Q

What do the eigenvalues represent?

A

The variance accounted for by each principal component

82
Q

How do we compute principal component scores?

A

Y1 = Z * av

where av is a vector of coefficients, the eigenvector of R corresponding to the Vth largest eigenvalue

83
Q

What is sometimes referred to as component loading?

A

The correlations of the original variables with the principal components

84
Q

How do you obtain the correlations of the original variables with the principal components?

A

Corr(Xi, Yj) = sqrt(lambda) * aij

ie multiply the elements of the eigenvectors by the square root of the corresponding eigenvalue.

85
Q

What does a principal component loading of relatively large magnitude (approaching magnitude of 1 indicate)?

A

Indicate that the corresponding original variable was important in defining that particular principal component the element relates to.

86
Q

If the original variables are all positively correlated with a particular PC, what will the elements of the corresponding eigenvector be?

A

They will all be positive

87
Q

If the correlations of the original variables and a particular PC are all of the same magnitude, then what can be said of the elements in the corresponding eigenvector?

A

They will be of about the same magnitude.

That is if we were to create a principal component score using the elements of the eigenvector, it would essentially be an equally weighted average of each variable.

88
Q

What is the size factor?

A

The largest principal component [in these circumstances]

89
Q

Why might the second PC be less uniform than the first?

A

It has to be orthogonal to the first (and all other PCs). In order for this to be fulfilled, the sum of the cross product of the elements of 1 PC eigenvector to another must be zero.

So, since the first PC has all positive values, the second must be a mix of positive and negative.

When we have an overall size factor, the succeeding principal components with alternating positive and negative signs are usually interpreted as contrasts.

90
Q

Why are the first few PCs in many instances more interpretable?

A

They explain most of the variation in the set of variables.

Frequently, the smaller components are more difficult to interpret as to what they represent.

91
Q

Why is it useful to examine the sum of the squares of the loadings of each row of the principal component loading matrix?

A

The row sum of squares indicates how much variance for that variable is accounted for by the retained PCs.

Eg if we retain 2 PCs, add together the two corresponding values int he row to see how much variation is maintained (cumulative)

92
Q

How many principal components should you choose?

A

We want to choose enough to adequately represent the data. There are many different criteria suggested to do this
- Include any component with an eigenvalue (lambda-i) >= 1
- Cattell’s scree criterion
- Retain enough PCs to account for at least X% of the overall variability
- Retain enough PCs to account for at least X% of the variation in each variable

93
Q

Describe the rationale for including any component that has an eigenvalue (lambda-1) >= 1

A

The overall variance is equal to p, so the average variance should be equal to at least 1.

94
Q

What is Cattell’s scree criterion?

A

Look at the scree plot of the ordered eigenvalues.

The components which make up the steepest part of the curve are included, whilst those on the flatter part are discarded.

Sometimes the elbow itself is included and sometimes it is not.

95
Q

What is the point of separation of the Scree plot called?

A

An elbow

96
Q

What does the number of components retained depend on?

A

The eventual use of the PCs

If they were going to be used as independent variables in a regression analysis, then we might want to retain components such that all the variables are adequately represented by the PCs.

97
Q

What are principal component scores calculated for?

A

Individual rows

98
Q

Why would you plot the principal component scores for any pair of principal components?

A

Check for outlying observations, searching for clusters and in general understanding the structure of the data.

99
Q

What are reasons data can be missing?

A
  • Participants forget to answer one or more questions
  • Refuse to answer sensitive questions
  • Grow fatigued and fail to complete a long questionnaire
  • Study participants miss appointments or drop out
  • Recording equipment fails
  • Data miscoded
  • Data may be lost for reasons you may never be able to ascertain
100
Q

What do most statistical methods assume?

A

That you are working with complete data

101
Q

What is a comprehensive approach for data handling?

A
  • Identify the missing data
  • Examine the causes of the missing data
  • Delete the cases containing missing data, or replace (?impute) the missing values with reasonable alternative data values
102
Q

What are the three missing data-mechanisms defined in Rubin’s classification system?

A
  • Missing completely random (MCAR)
  • Missing at random (MAR)
  • Missing not at random (MNAR)

It depends on how the missing data process is related to the underlying hypothetical complete data.

103
Q

When are the data described as missing completely at random (MCAR)?

A

If the missingness of a variable Y is unrelated to either the value of Y or that of other measured variables.

ie the observed data points are a simple random sample of the data had the data been complete. Missing cases are no different than non-missing cases.

These can be thought of as randomly missing. The only really penalty in failing to account for missing data is loss of power.

104
Q

What are some examples of MCAR?

A
  • There are random missing questions throughout a survey
105
Q

When are the data described as missing at random (MAR)?

A

When the missingness of a variable Y is unrelated to the value of Y itself after conditioning on other observed values.

ie missing data depends on known values and thus it is fully described by variables in the dataset. Accounting for the values which “cause” the missing data will produce unbiased results in an analysis.

106
Q

What are some examples of MAR?

A
  • Education survey where GCSE question was left blank - it didn’t make sense for global people
107
Q

When are data described as missing not at random (MNAR)?

A

When the missingness of a variable Y still depends on the value of Y even given the observed variables.

When data is missing in an unmeasured fashion, this is also termed “non-ignorable”. Since the missing data depends on events or items which the researcher has not measured this is a damaging situation.

Can’t infer from the dataset why

108
Q

Which of the mechanisms is it possible to verify?

A
  • Can verify whether data are MCAR or not
  • Impossible to test the MAR mechanism (with exception)
109
Q

Which mechanism is assumed in many popular techniques?

A

MAR

110
Q

Describe accessible vs inaccessible missing data mechanisms?

A
  • An accessible mechanism is one where the cause of missingness can be accounted for
  • MCAR and most MAR
  • An inaccessible mechanism is one where the missing data mechanism cannot be measured
  • Nonignorable mechanisms and MAR mechanisms

Often the missing data mechanism is made up of both accessible and inaccessible factors.

111
Q

What can you do to evaluate potential mechanisms leading to missing data and the impact of missing data on your ability to answer questions?

A

Identify the amount, distribution and pattern of missing data

  • What % of data is missing?
  • Are the missing data concentrated in a few variables or widely distributed?
  • Do the missing values appear to be random? - this may be seen when plotted
  • Does the data suggest a possible mechanism that’s producing the missing values?
112
Q

If missing data are concentrated in a few relatively unimportant variables, what can you do?

A

Delete the variables and continue the analysis normally

113
Q

If a small amount of data (< 10%) are randomly distributed throughout the dataset (MCAR) what can you do?

A

Limit the analysis to cases with complete data and still get reliable and valid results

114
Q

If you have MCAR or MAR data, what can you do?

A

Apply multiple imputation methods and arrive at valid conclusions

115
Q

If the data are NMAR what can you do?

A

Turn to specialised methods or collect new data

116
Q

What are some examples of NMAR

A
  • Depression study where questions are omitted describing depressed mood by depressed people and older people
  • Cancer questionnaire where < 10 was omitted from dataset
117
Q

What are the three most popular approaches for dealing with missing data?

A
  • A rational approach for recovering data
  • A traditional approach involving deleting missing data or simple data imputation
  • A modern approach that is based on simulations to perform the missing data imputation
118
Q

What is the rational approach for recovering data?

A

Using mathematical or logical relationships among variables to attempt to fill in or recover missing values. It may be exact or approximate

  • Eg variables which are part of an equation, can use one to calculate another
  • Eg date of birth and age etc.
  • Eg using numerical answers to answer binary questions
119
Q

What are traditional approaches for dealing with incomplete data?

A

List-wise and pair-wise deletion methods

The advantage of these methods is that they are convenient and are standard options.

120
Q

What is list wise deletion?

A

A case is dropped from an analysis because it has a missing value in at last on of the specified variables.

121
Q

What is pair wise deletion?

A

Approach which uses cases which contain some missing data.

eg looking at the subset of the dataset that contains eg two columns that we want to find correlation between

122
Q

What are many multivariate techniques, eg multivariate regression, computationally based on?

A

A correlation matrix (pair-wise correlations of all variables) that is computed as the first step

123
Q

What does list-wise deletion assume?

A

MCAR data

If this assumption does not hold it can produce distorted parameter estimates

It is not recommended unless the portion of missing data is very small.

124
Q

What are the advantages/disadvantages of pair-wise deletion?

A
  • It allows you to use more of your data
  • Each computed statistic may be based on a different subset of cases which can be problematic - things may become irrelevant
125
Q

What is simple imputation?

A

You substitute a value for each missing value. Standard statistical procedures for complete data analysis can then be used with the filled-in data set.

  • Eg input the variable mean of complete cases
  • Eg input the mean conditional on observed values of other variables

Simple imputation does not reflect the uncertainty about the predictions of the unknown missing values, and the resulting estimated variances of the parameter estimates will be biased toward zero.

126
Q

What is the modern approach for dealing with incomplete data based on?

A

Simulations

127
Q

What is multiple imputation (MI)?

A

An approach to dealing with missing values based on repeated simulations.

Instead of filling in a single value for each missing value, replace each missing value with a set of plausible values that represent the uncertainty about the right value to input.

The multiple imputed data sets are then analysed using standard procedures for complete data and combined,

128
Q

How many datasets are usually generated in multiple imputation (MI)?

A

3 - 10

129
Q

How many distinct phases are there of multiple imputation inference and what are they?

A

Three
1 - the missing data are filled m times to generate m complete data sets
2 - the m complete data sets are analysed by using standard procedures
3 - the results from the m complete data sets are combined for the inference

130
Q

What are imputation methods dependent on?

A

Types of missing data pattern / types of missingness

131
Q

What is used for monotone missing data patterns?

A
  • Parametric regression method - assumes multivariate normality
  • Nonparametric method which uses propensity scores
132
Q

What is used for an arbitrary missing data pattern?

A

A Markov chain Monte Carlo (MCMC) method that assumes multivariate normality.

This creates multiple imputations by using simulations from a Bayesian prediction distribution for normal data.

Another way to handle a data set with arbitrary missing data pattern is to use the MCMC approach to impute enough values to make the missing data pattern monotone.

133
Q

What is a monotone missing pattern?

A

When a variable Y is missing for the individual I implies that all subsequent variables are missing for the individual i.

You have greater flexibility in your choice of strategies.
You can implement a regression model without involving iterations as in MCMC