Latent Class Analysis Flashcards

1
Q

What is the main goal of latent class analysis?

A

You categorise people e.g what calculator are you?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe a latent class profile

A

The items go along the x axis while the probabilities of scoring on that item go along the y axis. Lines are then constructed for each classification giving the probability for each item.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What data is applicable to latent class analysis?

A

Categorical latent data and categorical observed data

The latent variable, θ, is categorical
• (e.g., θ=1, θ=2, θ=3, or
θ=“avoidant”, θ=“anxious”, θ=“secure”, )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give some examples of categorical latent variables in psychology

A
  • Developmental stages (Piaget)
  • Attachment types A,B,C,D
  • Deviant behavior: Autism, Dyslexia
  • Learning styles
  • Personality types
  • Types of anti-social behaviour
  • Mastery versus non-mastery of a skill
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do some researchers (who aren’t super into methods) categorise subjects? What problems arise here? (4)

A
Using their own ad-hoc criteria ( cutting off continuous data at certain points to create categories):
• Criteria are arbitrary
• You cannot falsify classes
• You cannot find new classes
• No explicit falsifiable model

You need a clear theory which gives a reason for these cut off points, otherwise this makes it very difficult.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the categories referred to as?

A

Each category is called a “latent class”. A subject only belongs to one class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What assumption is there about the items within each class?

A

Within each class, the items are independent (local independence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Therefore, what is the goal of LCA?

A

Classify the subjects to the latent classes on the basis of the observed item scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Compare the IRT model function to the LCA model function

A

In IRT, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 |𝜃𝑝) , are given by a line as q is continuous. The line is charaterised by the item parameters.

In LCA, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 | 𝜃𝑝 =𝑡) , are single points as the latent trait is categorical. The item parameters are these conditional probabilities. (see doc). The categories are along the x axis while the conditional probabilities are along the y axis when graphed, however a table is more commonly used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the what you feed to R in LCA (or what R makes for you) in regards to the factor structure, i.e instead of the covariance matrix in factor analysis

A

There is a score pattern listing out the possible outcomes (e.g 0000, all wrong; 0110, a and d wrong, b and c correct). Beside it is a column labelled Fijkl which lists how much each outcome was observed in the data. So if pattern 0000 has 93 beside it, then 93 participants scored all 4 items incorrectly. Fijkl just means the frequency of a response e.g 𝑖 is the response to A (𝑖 =0 or 𝑖 =1).

Instead of a covariance matrix, there are M independent pieces of information (response patterns minus 1): 𝑀 =𝐶^𝑛 −1
where 𝐶 is the number of categories, and 𝑛 is the number of items. So in 2 categories and four items:
𝑀 =𝐶^𝑛 −1 =2^4 −1=15 independent pieces of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

𝐶^𝑛 = ?

Why do we subtract 1?

A

𝐶^𝑛 = the number of response patterns

We -1 because if we know all except one of the frequencies, we can calculate the last one based on the total number of observations. Therefore it is not an independent piece of information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the typical output of a latent class model

A

A table with the classifications forming the rows and the size forming the first column and the items forming the rest of the columns. The conditional probabilities are then given for each item given each class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is meant by the class size?

A

The proportion of people in that class(ification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe a probability notation for the equation used in LCA

A

Probability of being in class 𝑡 and observing response vector [𝑖𝑗𝑘𝑙]is given by:
𝑃 (𝑋𝑝 = [𝑖𝑗𝑘𝑙] & 𝜃𝑝 = 𝑡 ) = 𝑃 (𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝1 =𝑖|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝2 =𝑗|𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝3 =𝑘|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝4 =𝑙|𝜃𝑝 =𝑡)
i.e
The probability of a pattern and being in a class = The probability of being in a class (class size) x the probability of that answer pattern (e.g prob of 1 on A x prob 0 on b etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the formula given in the book for this calculation

A

They use weird notation:
π|ABCDX, ijklt| = π|X,t| π|A|X,it| π|B|X,jt| π|C|X,kt| π|D|X,lt|

π refers to probability
X is the latent variable (instead of theta)

π|X,t| is the class size: the probability of being on level t on X. π|X,1| is the probability of being in class 1

π|A|X,it| is the conditional probability of scoring correctly on item A given level t. π|A,1|X,1| is the probability of scoring correctly on item A given that you’re in class 1. (table in docs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the main approaches to parameter estimation in LCA (2) and what approach will we use?

A

Bayesian estimation and maximum likelihood

We will do maximum likelihood using the ‘Ica.r’ R package developed by Han van der Maas

17
Q

How do you calculate the maximum likelihood?

A

if the probability of observing response vector [𝑖𝑗𝑘𝑙] in class 𝑡 is the probability of response A x response B…

and the probability of observing response vector [𝑖𝑗𝑘𝑙] AND class 𝑡 is the probability of being in a class x probability of response A x response B…

Then the overall probability is of observing [𝑖𝑗𝑘𝑙] is summing the probability of observing response vector and class t over the classes:
π|ABCD, ijkl| = Et π|X,t| π|A|X,it| π|B|X,jt| π|C|X,kt| π|D|X,lt|

So for the probabuility of scoring 1111, you would multiply the probabilities of scoring 1111 in class 1, then multiply the probabilities of scoring 1111 in class 2 and adding the results.

18
Q

What is the purpose of the log-likelihood function?

A

The probabilities of observing the data are to the power of a larger number (Fijkl) and thus are extremely small. The log of these values are easier to deal with

19
Q

What two algorithms can be used for maximisation?

A

Expectation Maximization (EM) algorithm

Newton-Raphson algorithm

20
Q

Why is knowing these calculation steps useful?

A

1) understanding how you estimate the parameters of a latent class model
2) This process will be used to assess model fit in a bit

21
Q

What two methods are described to maximise the likelihood?

A

Expectation Maximization (EM) algorithm

Newton-Raphson algorithm

22
Q

What two steps are involved in the expectation maximisation algorithm?

A
  1. E step:
    Determine the expected values of the latent classes given some initial values for the other parameters
  2. M step:
    Maximize the likelihood using these expected values to obtain new values for the other parameters

Iterate between 1 and 2 until the parameter values don’t change much anymore. When they stay around the same value you know the models have converged.

23
Q

What is involved in the Newton-Raphson algorithm?

A

Approximates the log-likelihood function locally using linear functions to find directions towards the maximum

e.g start somewhere random, use a linear approximation to figure out which direction to go. Use another linear approximation in that direction, new linear approximation etc.

24
Q

How do we scale the latent variable in LCA?

A

For the first time in this course we don’t need to . The latent variable has a scale defined by the number of categories

25
Q

How do we carry out statistical identification in LCA?

A

• The number of parameters should not exceed the number of independent pieces of information
- In LCA this can happen if you have too few observed variables for your model

26
Q

How do you calculate the df in LCA?

A

𝑑𝑓 = 𝑀–𝑘
• 𝑀: number of independent pieces of information
• 𝑘: number of parameters

𝑀 =𝐶𝑛 −1 (discussed before)
𝑘: manually count

For instance, fit a 3 class model to 5 dichotomous items 
𝑀 =2^5 −1=31 (two (dichotomous items)^# items)
𝑘 =5×3(conditional probabilities) +3−1 (class probability parameters) =17
𝑑𝑓 =31−17=14

(5 items x 3 classes) + (3 classes - 1 class) since you can estimate it based on the other two- so 2 independent pieces of information

27
Q

What other expected value is left to calculate in this model?

A

Expected number of subjects

28
Q

How do you calculate the expected number of subjects

A

The goal is to get an expected value for how many participants get a particular score e.g 0000

For this you multiply the probability of observing this score 0000 based on observations by the sample size

π|ABCD, ijkl| x N

fijkl column in R

29
Q

What does the goodness of fit statistic calculate?

A

(Fijkl - fijkl)^2 / fijkl

aka the difference between the observed number of subjects and the predicted number of subjects squared and divided by the predicted number of subjects. To get the goodness of fit (x^2) you sum these values

30
Q

What dioes it mean if the goodness of fit statistic is significant?

A

Your observations are significantly different than you would expect according to the model. You hope that it is insignificant.

31
Q

How else can you calculate the goodness of fit?

A

G^2: the sum of (the the observed number of subjects multiplied by the log of the observed number divided by the estimated number) multiplied by 2

Asyntotically these are the same- as the sample size approaches infinity these values will be the same however in practice they are often different.

32
Q

Name three comparative fit measures, what they require and how to calculate them

A

• Likelihood ratio test (nested models):
• 𝜒2 = −2(𝑙𝑜𝑔𝐿𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 –𝑙𝑜𝑔𝐿𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡)
with 𝑑𝑓 = 𝑘𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 −𝑘𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡
or equivalently
• 𝜒2 =𝐺2𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 −𝐺2𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡
with 𝑑𝑓 =𝑑𝑓𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 –𝑑𝑓𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡
• Don’t use the pearson 𝜒2

  • Akaike Information Criterion (AIC)
    • 𝐴𝐼𝐶 =−2𝑙𝑜𝑔𝐿−2×𝑑𝑓
    • Compare to competing model, no nesting necessary
  • Bayesian Information Criterion (BIC)
    • 𝐵𝐼𝐶 =−2𝑙𝑜𝑔𝐿−log(𝑁)×𝑑𝑓
    • Compare to competing model, no nesting necessary

Note that: here 𝑑𝑓 is used (not 𝑘 as in factor analysis)

33
Q

How would you obtain the model statistics from your analysis in R give that you saved your model as ‘res’?

A

summary(res)

34
Q

In docs there is a screenshot of LCA R output describe the relevant information

A

It gives statistics for an ‘estimated model’ and a ‘saturated model’. The estimated model is the model you created, the saturated model is a model where you have a single parameter for each response pattern (df = 0), this is therefore a perfect fit as you can reproduce your dataset perfectly.

It first gives the number of parameters and log-likelihood of each model, each previously explained. You look at the statistics for the estimated model typically. It then also gives the BIC for model comparison.

It then gives the likelihood ratio test betwee the estimated and saturated model and the pearson chi square.

35
Q

Contrast the parameters involved in IRT, FA and LCA

A
IRT has the following parameters:
• Difficulty 
• Discrimination 
• Guessing
• (Factor mean)
• (Factor variance)
FA has the following parameters:
• Factor loadings
• Residual variances
• Factor variance
• (Factor mean)

LCA has the following parameters:
• Conditional probabilities
• Class probabilities

36
Q

Contrast the absolute model fit statistics involved in IRT, FA and LCA

A

IRT:
Q-statistic

FA:
• Χ^2
• RMSEA
• CFI
• ...

LCA:
• Pearson χ2
• G^2

37
Q

Contrast the comparative model fit statistics involved in IRT, FA and LCA

A

The same in each:
• Likelihood ratio
• AIC
• BIC

38
Q

Contrast the latent variable estimates involved in IRT, FA and LCA

A

IRT:
Latent variable level (e.g., factor.scores() in ltm package)

FA:
Not covered in this course

LCA:
Class membership (e.g., predict() in LCA.r)