RSP Flashcards
why is FA preferable to PCA?
PCA is the default method of extraction in many statistical packages.
PCA is not a true method of FA though; there is debate in the literature about whether it is just as appropriate as FA or not, but generally FA is preferable.
PCA is only a data reduction method and only became popular back when computers were slow and expensive to use; PCA was faster and cheaper than FA.
PCA is computed without any regard to an underlying structure caused by the latent variables, so all of the variance is used and included within the solution. But how often do we collect data without an a priori idea of relationships? Not often.
FA will tell you about the latent variables that cause covariance in the manifest variables and involves partitioning of shared variance vs. unique and error variance to reveal underlying factor structure. PCA does not do this. This means some values of variance accounted for may be inflated when using PCA.
How do PCA and FA differ?
PCA is fundamentally different from EFA because unlike factor analysis, PCA is used to summarize the information available from the given set of variables and reduce it into a fewer number of components.
In PCA, the observed items are assumed to have been assessed without measurement error. As a result, whereas both PCA and EFA are computed based on correlation matrices, the former assumes the value of 1.00 (i.e., perfect reliability) in the diagonal elements while the latter utilizes reliability estimates. Thus, PCA does not provide a substitute of EFA in either theoretical or statistical sense.
In FA, latent factors drive the observed variables (i.e., responses on the instrument), while in PCA, observed variables are reduced into components. Observed items in FA assumes measurement error. With factor analysis, the a priori idea of the relationships between variables is accounted for, and the shared variance is partitioned from the unique variance and error variance for each variable, but only the shared variance appears in the solution; PCA does not differentiate between shared vs. unique variance, which can produce inflated estimations of variance accounted for by the factors.
How or when do you know when to use either PCA or EFA?
PCA is good for item reduction, specifically reducing the number of items with losing as little variance as possible.
PCA should be implemented during the item screening phase.
EFA is used to determine the number of factors underlying this pool of items that was obtained from the PCA.
PCA should only be used in the context of reducing number of items in the scale within the item screening phase.
What is parallel analysis, how do you perform parallel analysis and when do you use it?
PA is the most accurate and objective approach to determining the number of factors underlying the data.
In PA, artificial data are generated from the eigenvalues of extracted factors from the obtained data, (this artificial data also has the same number of variables and observations as the original data). All variables are random. The parallel data are then factor analyzed and have eigenvalues computed for each trial are recorded. The average of these eigenvalues is compared to those for the factors extracted from the original data. If the eigenvalue of the original data’s factor is greater than the average of the eigenvalues of the parallel factor, that factor is retained; if the eigenvalue of the original data s factor is equal to or smaller than the average, that factor is considered no more substantial than a random factor and therefore discarded.
It is suggested that researchers running an EFA should use this method to determine factor number underlying the data’s variance in tandem with other information (i.e., interpretability of factors)
what is PCA?
PCA is a data reduction technique wherein the goal is to reduce the data while losing as little information as possible.
what is true FA?
True FA is a statistical technique that estimates the unobserved structure underlying a set of observed variables and their relationships with each other. It helps answer the question of whether collected data are aligned with the theoretically expected pattern.
what is CFA
Confirmatory factor analysis (CFA) is a type of structur al equation modeling that deals specifically with measurement models; that is, the relationships between observed measures or indicators (e.g., test items, test scores , behavioral observation ratings ) and latent variables or factors . The goal of latent variable measurement mode ls (i.e., factor analysis ) is to establish the number and nature of factors that account for the variation and covariation among a set of indicators. A factor is an unobs ervable variable that influences more than one observed measure and which accounts for the correlations among th ese observed measures. In other words, the observed measures are intercorrelated because they share a common cause (i.e., they are influenced by the same underlying construct); if the latent construct was partialled out, the intercorrelations among the observed measures woul d be zero. Thus, a measurement model such as CFA provides a more parsimonious understanding of the covariation among a set of indicators because the number of factors is less th an the number of measured variables
What is Fisher’s r to z transformation? Also. Provide an original example of how you might use Fisher’s r to z transformation.
The Fisher’s r to z transformation converts r statistics to z scores in order to determine whether there are significant differences between 2 correlation coefficients (ra and rb), or determine if two correlations have different strengths. When determining mere differences, if ra is greater than rb, z will have a positive sign. If ra is smaller, then z will be negative. The way that it works is by transforming the sampling distribution of Pearson’s r to a normal distribution. It can also be used to determine confidence intervals for r and the differences between correlations. You can use tables to find these values or use a formula, z’ = .5[ln(1+r) – ln(1-r)]. Finding the difference between correlations has limited use compared to determining the difference in correlations’ strengths. Suppose you are conducting criterion related validity studies on two different selection tests. You might use this transformation to determine which test, if either, has the strongest correlation with the criterion (in this case, it is likely job performance).
Briefly explain intra class correlation and provide an original example of how you might use intra class correlations.
The intraclass correlation coefficient (ICC) is often used to measure interrater reliability, usually for more than 2 raters. It ranges from 0-1. An ICC closer to 1 indicates that there is more agreement between raters, while an ICC closer to 0 indicates low reliability. There are several formulas that can be used to calculate ICC, and it is quite a complex process to calculate it by hand. This is mainly because the ICC is flexible and open to adjustment for inconsistency among raters. The ICC is a composite of intra and inter rater variability, which corroborates the need for differences to be non-systematic. There are different models applied to ICC. In one model, each is rated by a different and randomly selected group of raters, and in another each subject is rated by the same group of raters. Therefore, ICC will produce a different measurement for each of these models. IN the first, ICC is a measure for absolute agreement, and in the second, a choice can be made. Specifically, you can choose between consistency, wherein systematic differences between raters are irrelevant, and absolute, in which systematic differences are relevant. There are also single or average measures of ICC, usually given in software outputs. One is the single measure, in which the ICC is an index for one single rater. The second is average, where the ICC is an index for the reliability of different raters averaged together. An example of when you might use ICC might be if you want to know if employees on a work team have similar levels of a trait, such as agreeableness, and then compare those work teams’ scores across teams within the company. In this case, you would use the one-way coefficient because you want to know what proportion of variance is between subjects vs within subjects.
How do parameters function in IRT?
A one-parameter model, as the name implies, has a single parameter; one of item difficulty, which is shown in an item characteristic curve (ICC) as the point where the slope is the steepest in the S-curve. A two-parameter model adds item discriminability, which is how well an item discriminates between people with different levels of the latent trait in question—this is represented by the steepness of the slope in the ICC. A three-parameter model additionally adds a guessing parameter, or a y-intercept (with the y-axis of an item characteristic curve being the probability of getting the item correct). A y-intercept thus says, in IRT terms, “this is the probability of getting this item correct given the minimum level of the latent trait in question.” A four-parameter model adds an upper limit (an upper asymptote to the three-parameter model’s lower asymptote)—a maximum probability of getting the item correct.
DIF: IRT vs. CTT
Differential item functioning is another aspect of psychometrics which is more easily assessed using item response theory methods than those based in classical test theory due to its greater precision, particularly due to the use of item characteristic curves.
DIF in IRT
Subgroups–say men and women for example–possess the same levels of the latent trait (assuming, of course, this is what is theorized), men would have a higher probability of giving a correct (or incorrect, as the case may be) answer. This would be reflected in item characteristic curves, pretty literally showing the functioning of the item or items in question by subgroups of interest. In short, the more parameters within a model, the more accurate the description of a function becomes. That being said, the more parameters you want in your model, the bigger your sample size needs to be, to the point where they may become prohibitively large One-parameter models such as the Rasch model are the most common ones Places a limit on the information you can take from an item—a limit on the explanation of the functioning of that item. The more parameters, the more thorough the explanation, but the more parameters, the more participants you need. That places an asymptote, so to speak, on the information re: item functioning that we can take from an IRT model.
ordinal scale
those whose values are placed in meaningful order, but the distances between the values are not equal.
interval scale
scale have values that have order, but they also have equal distances between each unit on the scale.
ratio scales
same as interval but can have a value of 0
randomized control study
random assignment to groups and testing the effects of a particular treatment
quasi experimental design
in a quasi-experimental design, the research usually occurs outside of the lab, in a naturally occurring setting.
correlational research design
participants are not usually randomly assigned to groups. In addition, the researcher typically does not actually manipulate anything. Rather, the researcher simply collects data on several variables and then conducts some statistical analyses to determine how strongly different variables are related to each other.
drawbacks of experimental designs
they are often difficult to accomplish in a clean way and they often do not generalize to real-world situations.
what are the three basic components of IRT?
Item Response Function (IRF) – Mathematical function that relates the latent trait to the probability of endorsing an item Item Information Function – an indication of item quality; an item’s ability to differentiate among respondents Invariance – position on the latent trait can be estimated by any items with know IRFs and item characteristics are population independent within a linear transformation
provide a general overview of item analysis
Item analysis provides a way of measuring the quality of questions - seeing how appropriate they were for the respondents and how well they measured their ability/trait. It also provides a way of re-using items over and over again in different tests with prior knowledge of how they are going to perform; creating a population of questions with known properties (e.g. test bank)
item analysis can be broken down the following way
classical test theory which is its own category latent trait models, which break down into IRT and Rasch models. IRT breaks down into: 1PL (which is similar to Rasch model), 2PL, 3PL, or 4PL
provide general overview of CTT
Classical Test Theory (CTT) - analyses are the easiest and most widely used form of analyses. The statistics can be computed by readily available statistical packages (or even by hand) Classical Analyses are performed on the test as a whole rather than on the item and although item statistics can be generated, they apply only to that group of students on that collection of items CTT is based on the true score model In CTT we assume that the error : Is normally distributed Uncorrelated with true score Has a mean of Zero
statistics involved in CTT
difficulty (item level) discrimination (item level) reliability (test level)
CTT vs. Latent trait models
Classical analysis has the test (not the item) as its basis. Although the statistics generated are often generalized to similar students taking a similar test; they only really apply to those students taking that test Latent trait models aim to look beyond that at the underlying traits which are producing the test performance. They are measured at item level and provide sample-free measurement
IRT general overview/description
refers to a family of latent trait models used to establish psychometric properties of items and scales Sometimes referred to as modern psychometrics because in large-scale education assessment, testing programs and professional testing firms IRT has almost completely replaced CTT as method of choice IRT has many advantages over CTT that have brought IRT into more frequent use
IRT: Item Response Function
Item Response Function (IRF) - characterizes the relation between a latent variable (i.e., individual differences on a construct) and the probability of endorsing an item. The IRF models the relationship between examinee trait level, item properties and the probability of endorsing the item. Examinee trait level is signified by the greek letter theta () and typically has mean = 0 and a standard deviation = 1
IRT: Item Characteristic Curves (ICC)
IRFs can then be converted into Item Characteristic Curves (ICC) which are graphical functions that represents the respondents ability as a function of the probability of endorsing the item
IRT item parameters: difficulty (b)
An item’s location is defined as the amount of the latent trait needed to have a .5 probability of endorsing the item. The higher the “b” parameter the higher on the trait level a respondent needs to be in order to endorse the item Like Z scores, the values of b typically range from -3 to +3
IRT item paramaters: discrimination (a)
Indicates the steepness of the IRF at the items location An items discrimination indicates how strongly related the item is to the latent trait like loadings in a factor analysis Items with high discriminations are better at differentiating respondents around the location point; small changes in the latent trait lead to large changes in probability Vice versa for items with low discriminations
z scores
standard scores that help understand where an individual score falls in relation to other scores in the distribution; number that indicates how far above or below the mean a given score is in SD units. z scores do NOT tell you how many items a person got correct, the level of ability the person has, how difficult the test was, etc. when used with a normal distribution, z scores can help determine percentile scores
percentile scores
indicate percentage of distribution that falls below a given score
what is the standard error?
a measure of how much random variation you would expect from samples of equal size drawn from the same population; it’s the standard deviation of the sampling distribution of whatever stat you’re looking at. it tells you how confident you should be that a sample mean represents the actual population mean; how much error can I expect when I select a sample of a given size from a population of interest?
central limit theorem
as long as you have a reasonably large sample size, the sampling distribution of the mean will be normally distributed, even if the distribution of scores in your sample is not
what do p values represent
the probability of getting a statistic by chance alone
IRT Item response function: 3PL
the d parameter is set to 1, individuals at low trait levels have a non-zero probability of endorsing the item/getting it correct
IRT - 2PL
discrimination and difficulty parameters are included
IRT - 1PL
the item discrimination is set to 1.0 or any constant. 1PL assumes that all scale items relate to the latent trait equally and items vary only In difficulty
primary difference between Rasch vs. IRT models
mathematically Rasch is identical to the 1PL IRT model, but the Rasch model is superior. data that doesn’t fit is discarded, and Rasch doesn’t allow abilities to be estimated for extreme items or peopl e
IRT - Test Response Curve (TRC)
item response functions are additive so that items can be combined to form the TRC. it is basically the trait relative to the number of items
IRT - Item information function (IIF)
it replaces item reliability. this is the level of precision an item provides at all levels of the latent trait. the IIF is an index representing the item’s ability to differentiate among individuals. the SEM is the variance of the latent trait level and is the reciprocal of information. thus, more information = less error.
IRT - Test information function (TIF)
adding all the IIFs to judge the test as a whole and see at which part of the trait range the test is working best.
IRT - invariance
since examinee trait levels don’t depend on which items are administered, item parameters therefore don’t depend on a particular sample of examinees. invariance allows us to link different scales that measure the same construct and compare examinees even if they respond to different items. this is how IRT allows us to implement CAT.