Latent Class Analysis Flashcards
What is the main goal of latent class analysis?
You categorise people e.g what calculator are you?
Describe a latent class profile
The items go along the x axis while the probabilities of scoring on that item go along the y axis. Lines are then constructed for each classification giving the probability for each item.
What data is applicable to latent class analysis?
Categorical latent data and categorical observed data
The latent variable, θ, is categorical
• (e.g., θ=1, θ=2, θ=3, or
θ=“avoidant”, θ=“anxious”, θ=“secure”, )
Give some examples of categorical latent variables in psychology
- Developmental stages (Piaget)
- Attachment types A,B,C,D
- Deviant behavior: Autism, Dyslexia
- Learning styles
- Personality types
- Types of anti-social behaviour
- Mastery versus non-mastery of a skill
How do some researchers (who aren’t super into methods) categorise subjects? What problems arise here? (4)
Using their own ad-hoc criteria ( cutting off continuous data at certain points to create categories): • Criteria are arbitrary • You cannot falsify classes • You cannot find new classes • No explicit falsifiable model
You need a clear theory which gives a reason for these cut off points, otherwise this makes it very difficult.
What are the categories referred to as?
Each category is called a “latent class”. A subject only belongs to one class
What assumption is there about the items within each class?
Within each class, the items are independent (local independence)
Therefore, what is the goal of LCA?
Classify the subjects to the latent classes on the basis of the observed item scores
Compare the IRT model function to the LCA model function
In IRT, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 |𝜃𝑝) , are given by a line as q is continuous. The line is charaterised by the item parameters.
In LCA, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 | 𝜃𝑝 =𝑡) , are single points as the latent trait is categorical. The item parameters are these conditional probabilities. (see doc). The categories are along the x axis while the conditional probabilities are along the y axis when graphed, however a table is more commonly used.
Describe the what you feed to R in LCA (or what R makes for you) in regards to the factor structure, i.e instead of the covariance matrix in factor analysis
There is a score pattern listing out the possible outcomes (e.g 0000, all wrong; 0110, a and d wrong, b and c correct). Beside it is a column labelled Fijkl which lists how much each outcome was observed in the data. So if pattern 0000 has 93 beside it, then 93 participants scored all 4 items incorrectly. Fijkl just means the frequency of a response e.g 𝑖 is the response to A (𝑖 =0 or 𝑖 =1).
Instead of a covariance matrix, there are M independent pieces of information (response patterns minus 1): 𝑀 =𝐶^𝑛 −1
where 𝐶 is the number of categories, and 𝑛 is the number of items. So in 2 categories and four items:
𝑀 =𝐶^𝑛 −1 =2^4 −1=15 independent pieces of information
𝐶^𝑛 = ?
Why do we subtract 1?
𝐶^𝑛 = the number of response patterns
We -1 because if we know all except one of the frequencies, we can calculate the last one based on the total number of observations. Therefore it is not an independent piece of information.
Describe the typical output of a latent class model
A table with the classifications forming the rows and the size forming the first column and the items forming the rest of the columns. The conditional probabilities are then given for each item given each class.
What is meant by the class size?
The proportion of people in that class(ification)
Describe a probability notation for the equation used in LCA
Probability of being in class 𝑡 and observing response vector [𝑖𝑗𝑘𝑙]is given by:
𝑃 (𝑋𝑝 = [𝑖𝑗𝑘𝑙] & 𝜃𝑝 = 𝑡 ) = 𝑃 (𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝1 =𝑖|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝2 =𝑗|𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝3 =𝑘|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝4 =𝑙|𝜃𝑝 =𝑡)
i.e
The probability of a pattern and being in a class = The probability of being in a class (class size) x the probability of that answer pattern (e.g prob of 1 on A x prob 0 on b etc)
Explain the formula given in the book for this calculation
They use weird notation:
π|ABCDX, ijklt| = π|X,t| π|A|X,it| π|B|X,jt| π|C|X,kt| π|D|X,lt|
π refers to probability
X is the latent variable (instead of theta)
π|X,t| is the class size: the probability of being on level t on X. π|X,1| is the probability of being in class 1
π|A|X,it| is the conditional probability of scoring correctly on item A given level t. π|A,1|X,1| is the probability of scoring correctly on item A given that you’re in class 1. (table in docs)