RSP Flashcards

1
Q

why is FA preferable to PCA?

A

PCA is the default method of extraction in many statistical packages.

PCA is not a true method of FA though; there is debate in the literature about whether it is just as appropriate as FA or not, but generally FA is preferable.

PCA is only a data reduction method and only became popular back when computers were slow and expensive to use; PCA was faster and cheaper than FA.

PCA is computed without any regard to an underlying structure caused by the latent variables, so all of the variance is used and included within the solution. But how often do we collect data without an a priori idea of relationships? Not often.

FA will tell you about the latent variables that cause covariance in the manifest variables and involves partitioning of shared variance vs. unique and error variance to reveal underlying factor structure. PCA does not do this. This means some values of variance accounted for may be inflated when using PCA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do PCA and FA differ?

A

PCA is fundamentally different from EFA because unlike factor analysis, PCA is used to summarize the information available from the given set of variables and reduce it into a fewer number of components.

In PCA, the observed items are assumed to have been assessed without measurement error. As a result, whereas both PCA and EFA are computed based on correlation matrices, the former assumes the value of 1.00 (i.e., perfect reliability) in the diagonal elements while the latter utilizes reliability estimates. Thus, PCA does not provide a substitute of EFA in either theoretical or statistical sense.

In FA, latent factors drive the observed variables (i.e., responses on the instrument), while in PCA, observed variables are reduced into components. Observed items in FA assumes measurement error. With factor analysis, the a priori idea of the relationships between variables is accounted for, and the shared variance is partitioned from the unique variance and error variance for each variable, but only the shared variance appears in the solution; PCA does not differentiate between shared vs. unique variance, which can produce inflated estimations of variance accounted for by the factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How or when do you know when to use either PCA or EFA?

A

PCA is good for item reduction, specifically reducing the number of items with losing as little variance as possible.

PCA should be implemented during the item screening phase.

EFA is used to determine the number of factors underlying this pool of items that was obtained from the PCA.

PCA should only be used in the context of reducing number of items in the scale within the item screening phase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is parallel analysis, how do you perform parallel analysis and when do you use it?

A

PA is the most accurate and objective approach to determining the number of factors underlying the data.

In PA, artificial data are generated from the eigenvalues of extracted factors from the obtained data, (this artificial data also has the same number of variables and observations as the original data). All variables are random. The parallel data are then factor analyzed and have eigenvalues computed for each trial are recorded. The average of these eigenvalues is compared to those for the factors extracted from the original data. If the eigenvalue of the original data’s factor is greater than the average of the eigenvalues of the parallel factor, that factor is retained; if the eigenvalue of the original data s factor is equal to or smaller than the average, that factor is considered no more substantial than a random factor and therefore discarded.

It is suggested that researchers running an EFA should use this method to determine factor number underlying the data’s variance in tandem with other information (i.e., interpretability of factors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is PCA?

A

PCA is a data reduction technique wherein the goal is to reduce the data while losing as little information as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is true FA?

A

True FA is a statistical technique that estimates the unobserved structure underlying a set of observed variables and their relationships with each other. It helps answer the question of whether collected data are aligned with the theoretically expected pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is CFA

A

Confirmatory factor analysis (CFA) is a type of structur al equation modeling that deals specifically with measurement models; that is, the relationships between observed measures or indicators (e.g., test items, test scores , behavioral observation ratings ) and latent variables or factors . The goal of latent variable measurement mode ls (i.e., factor analysis ) is to establish the number and nature of factors that account for the variation and covariation among a set of indicators. A factor is an unobs ervable variable that influences more than one observed measure and which accounts for the correlations among th ese observed measures. In other words, the observed measures are intercorrelated because they share a common cause (i.e., they are influenced by the same underlying construct); if the latent construct was partialled out, the intercorrelations among the observed measures woul d be zero. Thus, a measurement model such as CFA provides a more parsimonious understanding of the covariation among a set of indicators because the number of factors is less th an the number of measured variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Fisher’s r to z transformation? Also. Provide an original example of how you might use Fisher’s r to z transformation.

A

The Fisher’s r to z transformation converts r statistics to z scores in order to determine whether there are significant differences between 2 correlation coefficients (ra and rb), or determine if two correlations have different strengths. When determining mere differences, if ra is greater than rb, z will have a positive sign. If ra is smaller, then z will be negative. The way that it works is by transforming the sampling distribution of Pearson’s r to a normal distribution. It can also be used to determine confidence intervals for r and the differences between correlations. You can use tables to find these values or use a formula, z’ = .5[ln(1+r) – ln(1-r)]. Finding the difference between correlations has limited use compared to determining the difference in correlations’ strengths. Suppose you are conducting criterion related validity studies on two different selection tests. You might use this transformation to determine which test, if either, has the strongest correlation with the criterion (in this case, it is likely job performance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Briefly explain intra class correlation and provide an original example of how you might use intra class correlations.

A

The intraclass correlation coefficient (ICC) is often used to measure interrater reliability, usually for more than 2 raters. It ranges from 0-1. An ICC closer to 1 indicates that there is more agreement between raters, while an ICC closer to 0 indicates low reliability. There are several formulas that can be used to calculate ICC, and it is quite a complex process to calculate it by hand. This is mainly because the ICC is flexible and open to adjustment for inconsistency among raters. The ICC is a composite of intra and inter rater variability, which corroborates the need for differences to be non-systematic. There are different models applied to ICC. In one model, each is rated by a different and randomly selected group of raters, and in another each subject is rated by the same group of raters. Therefore, ICC will produce a different measurement for each of these models. IN the first, ICC is a measure for absolute agreement, and in the second, a choice can be made. Specifically, you can choose between consistency, wherein systematic differences between raters are irrelevant, and absolute, in which systematic differences are relevant. There are also single or average measures of ICC, usually given in software outputs. One is the single measure, in which the ICC is an index for one single rater. The second is average, where the ICC is an index for the reliability of different raters averaged together. An example of when you might use ICC might be if you want to know if employees on a work team have similar levels of a trait, such as agreeableness, and then compare those work teams’ scores across teams within the company. In this case, you would use the one-way coefficient because you want to know what proportion of variance is between subjects vs within subjects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do parameters function in IRT?

A

A one-parameter model, as the name implies, has a single parameter; one of item difficulty, which is shown in an item characteristic curve (ICC) as the point where the slope is the steepest in the S-curve. A two-parameter model adds item discriminability, which is how well an item discriminates between people with different levels of the latent trait in question—this is represented by the steepness of the slope in the ICC. A three-parameter model additionally adds a guessing parameter, or a y-intercept (with the y-axis of an item characteristic curve being the probability of getting the item correct). A y-intercept thus says, in IRT terms, “this is the probability of getting this item correct given the minimum level of the latent trait in question.” A four-parameter model adds an upper limit (an upper asymptote to the three-parameter model’s lower asymptote)—a maximum probability of getting the item correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DIF: IRT vs. CTT

A

Differential item functioning is another aspect of psychometrics which is more easily assessed using item response theory methods than those based in classical test theory due to its greater precision, particularly due to the use of item characteristic curves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DIF in IRT

A

Subgroups–say men and women for example–possess the same levels of the latent trait (assuming, of course, this is what is theorized), men would have a higher probability of giving a correct (or incorrect, as the case may be) answer. This would be reflected in item characteristic curves, pretty literally showing the functioning of the item or items in question by subgroups of interest. In short, the more parameters within a model, the more accurate the description of a function becomes. That being said, the more parameters you want in your model, the bigger your sample size needs to be, to the point where they may become prohibitively large One-parameter models such as the Rasch model are the most common ones Places a limit on the information you can take from an item—a limit on the explanation of the functioning of that item. The more parameters, the more thorough the explanation, but the more parameters, the more participants you need. That places an asymptote, so to speak, on the information re: item functioning that we can take from an IRT model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ordinal scale

A

those whose values are placed in meaningful order, but the distances between the values are not equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

interval scale

A

scale have values that have order, but they also have equal distances between each unit on the scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ratio scales

A

same as interval but can have a value of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

randomized control study

A

random assignment to groups and testing the effects of a particular treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

quasi experimental design

A

in a quasi-experimental design, the research usually occurs outside of the lab, in a naturally occurring setting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

correlational research design

A

participants are not usually randomly assigned to groups. In addition, the researcher typically does not actually manipulate anything. Rather, the researcher simply collects data on several variables and then conducts some statistical analyses to determine how strongly different variables are related to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

drawbacks of experimental designs

A

they are often difficult to accomplish in a clean way and they often do not generalize to real-world situations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are the three basic components of IRT?

A

Item Response Function (IRF) – Mathematical function that relates the latent trait to the probability of endorsing an item Item Information Function – an indication of item quality; an item’s ability to differentiate among respondents Invariance – position on the latent trait can be estimated by any items with know IRFs and item characteristics are population independent within a linear transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

provide a general overview of item analysis

A

Item analysis provides a way of measuring the quality of questions - seeing how appropriate they were for the respondents and how well they measured their ability/trait. It also provides a way of re-using items over and over again in different tests with prior knowledge of how they are going to perform; creating a population of questions with known properties (e.g. test bank)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

item analysis can be broken down the following way

A

classical test theory which is its own category latent trait models, which break down into IRT and Rasch models. IRT breaks down into: 1PL (which is similar to Rasch model), 2PL, 3PL, or 4PL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

provide general overview of CTT

A

Classical Test Theory (CTT) - analyses are the easiest and most widely used form of analyses. The statistics can be computed by readily available statistical packages (or even by hand) Classical Analyses are performed on the test as a whole rather than on the item and although item statistics can be generated, they apply only to that group of students on that collection of items CTT is based on the true score model In CTT we assume that the error : Is normally distributed Uncorrelated with true score Has a mean of Zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

statistics involved in CTT

A

difficulty (item level) discrimination (item level) reliability (test level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

CTT vs. Latent trait models

A

Classical analysis has the test (not the item) as its basis. Although the statistics generated are often generalized to similar students taking a similar test; they only really apply to those students taking that test Latent trait models aim to look beyond that at the underlying traits which are producing the test performance. They are measured at item level and provide sample-free measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

IRT general overview/description

A

refers to a family of latent trait models used to establish psychometric properties of items and scales Sometimes referred to as modern psychometrics because in large-scale education assessment, testing programs and professional testing firms IRT has almost completely replaced CTT as method of choice IRT has many advantages over CTT that have brought IRT into more frequent use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

IRT: Item Response Function

A

Item Response Function (IRF) - characterizes the relation between a latent variable (i.e., individual differences on a construct) and the probability of endorsing an item. The IRF models the relationship between examinee trait level, item properties and the probability of endorsing the item. Examinee trait level is signified by the greek letter theta () and typically has mean = 0 and a standard deviation = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

IRT: Item Characteristic Curves (ICC)

A

IRFs can then be converted into Item Characteristic Curves (ICC) which are graphical functions that represents the respondents ability as a function of the probability of endorsing the item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

IRT item parameters: difficulty (b)

A

An item’s location is defined as the amount of the latent trait needed to have a .5 probability of endorsing the item. The higher the “b” parameter the higher on the trait level a respondent needs to be in order to endorse the item Like Z scores, the values of b typically range from -3 to +3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

IRT item paramaters: discrimination (a)

A

Indicates the steepness of the IRF at the items location An items discrimination indicates how strongly related the item is to the latent trait like loadings in a factor analysis Items with high discriminations are better at differentiating respondents around the location point; small changes in the latent trait lead to large changes in probability Vice versa for items with low discriminations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

z scores

A

standard scores that help understand where an individual score falls in relation to other scores in the distribution; number that indicates how far above or below the mean a given score is in SD units. z scores do NOT tell you how many items a person got correct, the level of ability the person has, how difficult the test was, etc. when used with a normal distribution, z scores can help determine percentile scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

percentile scores

A

indicate percentage of distribution that falls below a given score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what is the standard error?

A

a measure of how much random variation you would expect from samples of equal size drawn from the same population; it’s the standard deviation of the sampling distribution of whatever stat you’re looking at. it tells you how confident you should be that a sample mean represents the actual population mean; how much error can I expect when I select a sample of a given size from a population of interest?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

central limit theorem

A

as long as you have a reasonably large sample size, the sampling distribution of the mean will be normally distributed, even if the distribution of scores in your sample is not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what do p values represent

A

the probability of getting a statistic by chance alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

IRT Item response function: 3PL

A

the d parameter is set to 1, individuals at low trait levels have a non-zero probability of endorsing the item/getting it correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

IRT - 2PL

A

discrimination and difficulty parameters are included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

IRT - 1PL

A

the item discrimination is set to 1.0 or any constant. 1PL assumes that all scale items relate to the latent trait equally and items vary only In difficulty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

primary difference between Rasch vs. IRT models

A

mathematically Rasch is identical to the 1PL IRT model, but the Rasch model is superior. data that doesn’t fit is discarded, and Rasch doesn’t allow abilities to be estimated for extreme items or peopl e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

IRT - Test Response Curve (TRC)

A

item response functions are additive so that items can be combined to form the TRC. it is basically the trait relative to the number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

IRT - Item information function (IIF)

A

it replaces item reliability. this is the level of precision an item provides at all levels of the latent trait. the IIF is an index representing the item’s ability to differentiate among individuals. the SEM is the variance of the latent trait level and is the reciprocal of information. thus, more information = less error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

IRT - Test information function (TIF)

A

adding all the IIFs to judge the test as a whole and see at which part of the trait range the test is working best.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

IRT - invariance

A

since examinee trait levels don’t depend on which items are administered, item parameters therefore don’t depend on a particular sample of examinees. invariance allows us to link different scales that measure the same construct and compare examinees even if they respond to different items. this is how IRT allows us to implement CAT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

IRT - assumptions

A

unidimensionality: individual differences are characterized by the single parameter of theta; a single common factor (the trait) accounts for the item covariance local independence: item responses are uncorrelated after controlling for the latent trait. it is related to unidimensionality homogenous population - the same IRF applies to all members of the population. DIF is a violation of this meaning that examinees who are equal on the trait are responding differently/have different probabilities of answering correctly

45
Q

IRT - local dependence

A

is when the assumption of local independence is violated. LD distorts item parameter estimates and casues scales to look more precise than they actually are. when LD exists, a large correlation between 2 or more items can dominate the trait, and cause the scale to lack construct validity. to rectify, remove one or more of the LD items from the scale

46
Q

IRT applications: Differential item functioning

A

comparing response probabilities across different age groups, genders, races, SES backgrounds can be used to translate tests into different languages by testing equivalency of items

47
Q

IRT applications:Scale construction and modification

A

IRT allows for the creation of a universe of items with known IRFs that can be used interchangeably you can use IRT to revamp a CTT scale

48
Q

CAT

A

an extension of IRT. once a universe of items with known IRFs is created, they can be used to form a CAT version of the test. an item is given to the person and it is assumed their theta is 0 for the first item, so item is easy or moderate b. their answer then allows for a new estimation of theta, and the next item is chosen based on that theta level. this is reiterated until a final theta estimation is reached.

49
Q

Discuss the issues with NHST

A

Testing for statistical significance is important because it allows us to determine whether the effects we obtain from a statistical test are due to random chance or sampling error, or something more meaningful. However, NHST is largely dependent upon sample size. With larger sample sizes, the chances of detecting a significant effect increase. Thus, even trivial effects are detected as significant, even though this effect may not be of practical significance. Relatedly, with small sample sizes, the chances of committing a type II error increase, and when there is a true effect present, the likelihood of the effect being detected statistically is decreased. Thus, large effects may be present in smaller sample sizes but not statistically significant. This has facilitated a movement toward more practical significance measures, such as effect size. Effect size gives information on the magnitude of an effect. We’ve over relied on it, placed too much emphasis and importance on it. Proponents acknowledge the limitations and say that instead of being definitive yes or no answer it should be an indicator or a guide. Opponents say it should be abandoned and is an injustice to the scientific method. A major point of contention with NHST is the assumption that follows from a significant result; that it can be assumed that significance is noteworthy. There is a need to stress the difference between practical and statistical significance. This is why effect sizes have been proposed as a secondary means of interpreting results, and the push and mandate for them to be included in journal articles/write ups. This problem is also exacerbated by the fact that many journals, especially big names, are often the first to turn away research that’s not significant; File drawer effect. Many critics go so far as to say we shouldn’t use the word significant at all because it is misleading. Daniel (1998) outlines several misconceptions that are now commonplace among social scientists that are derived from NHST, including the synonymous assumption of importance and significance, odd-against chance fantasy, replicability fantasy, that it’s the optimal way to evaluate stats, and the automatic assumption that p results will translate to other samples. Another issue is NHST’s reliance on sample size as a means of achieving significant results. another potential remedy to this issue would be to report “what if” analyses, which would indicate at what sample size a significant result would be considered not significant. Bootstrapping may also be promising in order to replicate effects observed in the original sample. CIs should also be reported as a means of estimating the sampling error.

50
Q

confidence intervals

A

Confidence intervals: provide another measure of effect size that allows us to make reasonable predictions about the approximate values of the population parameters. The CI provides a range of values that we are confident to a certain degree of probability, contains the true population parameter. “We are 95% confident that the true pop value lies between the values of X and Y”. When you increase the probability to 99%, your range of values will be more broad/larger. Larger samples allow for a more accurate representation of the the population, thus narrowing the CI.

51
Q

what conditions must be met in order to assume that the probabilities based on the normal distribution are accurate?

A

When the population SD is known and/or you have a large sample, usually greater than 120

52
Q

briefly discuss the t distribution

A

a family of distributions that adjusts probabilities that can’t be based on the normal distribution by taking the sample size into account. The t distribution you use for a given problem depends on the size of your sample. the shape of the t distribution changes as a function of the sample size, thus changing the probabilities associated with it. df allows for the sample size to be taken into account

53
Q

describe when to use the different types of t tests

A

One sample t test: Use when you want to compare the mean from a single sample to a population mean independent samples: use when you want to compare the means of two independent samples on a given variable. there is no overlap between the 2 samples. dependent: compare 2 means of a single sample or of two matched/paired samples. often used to measure at 2 timepoints

54
Q

data requirements for independent samples T test

A

one categorical or nominal IV, and one continuous DV homogeneity of variance (especially if sample sizes aren’t equal), normality, roughly equal sample sizes mann-whitney U test is non parametric alternative to this test

55
Q

independent t tests: conceptual explanation of the standard error of the difference

A

you are wanting to know that if you were to repeatedly select random samples of a certain size from each of the populations, what would be the average expected difference between the means? is the difference we see between our two sample means large or small compared to the difference we would expect to see just by selecting two different random samples? this involves combining the SEM of the two samples. the two samples thus need to be roughly equal in size because equal weight is given to each sample in calculating the SED. If not a formula can be used to adjust to account for unequal sample sizes to give proper weight to each sample’s contribution to the overall SED. the SED is basically the average expected difference between any 2 randomly selected samples from 2 populations. it gets smaller as your sample sizes get larger, which increases chances of significant t. standard = average error = amount of difference

56
Q

determining the significance of the t value for an independent samples t test

A

finds the probability of getting a t value of that size by chance to examine practical significance, simply look at the differences between the raw scores and use your judgment especially if your sample sizes are large. also look at effect size

57
Q

standard error of the difference for dependent samples t test

A

take the distribution of the difference scores between pre and post tests and get the mean and SD of that distribution of scores. the SD in this case is the SED between the dependent samples.

58
Q

data requirements: one way ANOVA

A

Categorical or nominal IV with at least 2 independent groups; continuous DV Homogeneity of variance in the DV is also necessary

59
Q

Difference between one way ANOVA and independent samples t test; when to use a one way vs. a t test

A

the results are the same except that ANOVA produces an F ratio, which is simply the t value squared. Thus, the one way ANOVA is usually used when comparing more than two groups. running multiple t tests increases the chance of a type 1 error due to the increase in probability of being wrong with each iteration. thus, ANOVA fixes this by adjusting and accounting for the number of groups being compared

60
Q

discuss post hoc tests for ANOVA

A

ANOVA provides an F value but does not determine among which groups there are significant differences in the DV. Thus, post hoc tests are designed to compare each group mean to each other group mean and determine if they are statistically significant, while controlling for multiple comparisons. Some post hoc tests are more conservative, meaning it is more difficult to detect a difference; some are more liberal. They are theoretically similar to conducting multiple t tests, but post hoc tests keep the type I error rate constant across comparisons. The difference between the types of post hoc tests is what each set uses for the standard error; some are designed to take differing sample sizes into account, others control for heterogeneity of variance in the independent variable. In general, more conservative tests allow us to be more confident that we are not committing a type I error.

61
Q

describe the purpose of the one way ANOVA

A

To determine the amount of variance in the DV that is attributable to between group differences vs. within group differences(random error, which would basically be scores within the sample that differ from the sample mean). It answers the question: is the average amount of difference or variation between the scores of members in different samples large or small compared to the average amount of variation within each sample (random error)?

62
Q

effect size in ANOVA

A

The percentage of variance in the DV that is explained by the IV; eta squared is the most common for anova

63
Q

factorial ANOVA

A

One continuous DV and 2+ categorical/nominal IVs; you get main effects and interactions. It allows you to examine effects of an IV on a DV over and beyond or in addition to the other IV (controlling for/partitioning effects)

64
Q

Interaction effects in ANOVA

A

Are when the differences between the groups of one IV on the DV vary according to the level of the second IV; aka moderator effects

65
Q

simple effects in ANOVA

A

Analogous to post hoc tests; test whether there are difference in the average scores of any particular cells

66
Q

ANCOVA

A

Explore the effects of IV while controlling for other variables; partitions out the variance in the DV due to these variables. Can control for either categorical/nominal variables or continuous variables. This helps control for error, or variance in the model that is not due to the IV/s by identifying and controlling for these sources of variance. Using type III sum of squares in ancova calculation allows us to determine the unique effect of each main effect and interaction effect on the DV.

67
Q

advantages of a repeated measures ANOVA vs. paired t tests

A

You can examine differences on a DV measured at more than 2 time points with RM With RM ANCOVA you can control for covariates With RM you can include one or more independent categorical or group variables

68
Q

RM ANOVA: variance partitioning and calculation of F ratio

A

We want to find out how much error variance there is in the DV but also how much of the total variance can be attributed to differences to time, trial, aka within individuals across times they were measured on the DV (within subjects variance). Remember that we are asking whether there are differences between the scores at time points on average. The f ratio: how large is the difference between the average scores at time 1 and time 2 relative to the average amount of variation among subjects in their change from T1 to T2. It’s a measure of systematic variance divided by random variance in scores.

69
Q

thresholds for weak, moderate, and positive correlation coefficients

A

-.20-.20 = weak .20-.50 = moderate .50 above = strong Remember that context matters; for example in selection there might be weaker correlations by this standard but in that context it could be strong since there is so much variance to predict in job performance

70
Q

discuss how curvilinear relationships affect the interpretation of correlation coefficients

A

When the relationship is curvilinear, the correlation coefficient can be pulled down, making it quite small, suggesting a weaker relationship that may actually exist. This means that some correlation coefficients may be incorrectly interpreted at face value

71
Q

Most common type of correlation coefficient

A

Pearson product moment; but limited in that both variables must be measured on interval or ratio scales. Specialized versions of this coefficient can be used when this is not met, including point biserial, phi, and spearman Rho.

72
Q

Point biserial, phi, and spearman rho correlation coefficients: what are they and when to use

A

Point biserial: one continuous, other is dichotomous Phi: both variables are dichotomous Spearman: 2 variables that use ranked data

73
Q

Coefficient of determination

A

Tells us how much of the variance in the scores of one variable can be understood or explained by scores on the second variable; tells us more about the shared variance between two variables/the correlation coefficient, larger r = large coefficient of determination. The COD is the r squared

74
Q

truncated range

A

A problem with correlation coefficients in which the scores on one or both variables don’t have much variation; this can happen with ceiling or floor effects where a very easy or very hard test affects the range of scores

75
Q

Assumptions of regression

A

Variables: predictor=continous or dichotomous; DV = continuous linear relationship between predictor and DV All variables = normally distributed Predictor variables can’t be too strongly correlated with each other (multicollinearity) Heteroscedasticity: the errors in the prediction of the DVs are about the same at all levels of the the predictor

76
Q

Simple Regression vs. correlation

A

Regression yields more information, more intuitive; includes formula for calculating predicted value rather than just a single number

77
Q

The regression formula uses what?

A

Ordinary least squares (OLS), which is based on the sum of squares. SS is the distance between the point and the regression line, square these distances for each point (squared deviations) and then add them together to get SS. The line of least squares becomes the regression line, the smallest sum of squared deviations as possible to get the line of best fit.

78
Q

Effect size in multiple regression

A

Is the R squared value, which is the coefficient of determination, also a way of determining practical significance

79
Q

CTT vs. IRT general

A

You can examine the item level with IRT but CTT you can only look at test as a whole CTT doesn’t consider how examinees respond to a given item and no basis exists for determining how well an examinee might do when confronted with a test item CTT: limited solutions to common testing problems such as test design, adaptive testing, identification of biased items, and equating of test scores

80
Q

problems with p values and proposed alternatives

A

• P values say nothing about the magnitude of an effect Only the confidence we can have that the relationship exists ○ When the difference between 2 means becomes more narrow, we need more people in order to elicit that • New school of thought calls for reporting of confidence intervals allowing for subjective judgement of results, then effect size • Basian statistics uses aggregate results Confidence grows as more data is integrated

81
Q

EFA vs. CFA

A

EFA has no a priori hypothesis about what the scale is going to be like. YOu’re just trying to describe relationships and nothing is specified in advance; no inferential statistics, only descriptive. Rotation in EFA is used to improve interpretability. Orthogonal optimizes interpretability but varivmax is good when you want to create sub-scales, even if you expect the dimensions to be correlated to each other. Communality is a measure of the shared variance between scale items. CFA gives us inferential stats. You use the information you gathered from the EFA to make a priori hypotheses about the scale. It gives model fit significance values and takes error estimates into account. This means that if youre using this construct in a broader research context, the relationship that is estimated with CFA is estimated with minimal measurement error. This is good because youre using the latent construct estimate while controlling and portioning out the error. It gets measurement noise out of the way for you.

82
Q

PCA

A

Tries to explain the maximum amount of total variance in a correlation matrix, assuming that the shared variance is 1. It’s good for making sub scales that you want to generalize from your sample to the population.

83
Q

explain Factor Analysis in general

A

Observed vs. latent variables How well do items represent the construct we’re trying to measure? Variables must be continuous and normal, sample size must be large enough Pulling items from “universe” of items to best represent the construct Responses on dimensions should be correlated with each other, this indicates they are truly measuring the same thing. This is what EFA does for us, it calculates correlations between items and tells us which items hang together, which items are measuring the same thing and to what extent this is happening. It first finds which items are most strongly correlated with each other, groups them together, then looks for the next strongest batch, in this way it attempts to create factors that explain the most variance possible in all the items. Stronger correlations between items = more variance explained. Extraction is when the EFA stops because new factors wouldn’t create any additional variance. Factor loadings = how items fit into a factor, you want them to be high. Range from -1 to 1 generally. Even negative factor loadings that are high are good, this Is usually what happens with negatively worded items. Factor loadings less then .30 are usually discarded. Eigenvalues = variance explained within a factor, sometimes people use the cut off of at least 1.0 but this is debated. It’s ultimately up to the researcher, think in practical terms too in terms of what items go best with which factors

84
Q

explain factor rotation in FA

A

In the process of identifying and creating factors, FA works to make the factors distinct from each other. In orthogonal rotation, the FA rotates the factors to maximize the distinction between them, so it will create the first factor then will try to make the second factor as distinct and different as possible. Without this rotation, the FA might produce several factors that were all some variation of the most highly correlated items before getting around to the next, less strongly correlated items. Orthogonal (varimax) rotation ensures factors are unique as possible. Oblique rotation does not assume orthogonality, and allows factors to be correlated with each other. The rotated factor matrix tells you how the items are related to each factor after this rotation has occurred.

85
Q

Factor analysis; reliability analysis

A

is specifically what tells you how well the items hold together (internal consistency), after the FA has created the groups/factors. Most commonly used is chronbach’s alpha, the idea is that all survey items have similar responses by respondents, indicating they measure a single underlying construct. This means the construct is being measured reliably by the items. Alpha indicates the average correlations among a set of items, and the more items they are, the higher alpha will be. Alpha also depends on the strength of correlations between items.

86
Q

CFA

A

Is used to test how well a hypothesized structure fits a set of data; is sometimes used after EFA and another sample has been collected if youre developing a scale. It’s a type of SEM. If you already have a good guess about how the variables in the study (or the survey items) will go together, CFA is used to test and confirm this. You organize the items a priori according to a strong theoretical rationale. CFA produces a series of fit statistics that provide information on how well your proposed factor model fits the data collected. Modifications are needed if you get weak fit statistics, if strong, your model is confirmed.

87
Q

Best practices in meta analysis

A
  1. Report summary effect sizes but also variance around overall estimate and moderators that could contribute to this variance across studies 2. The quality of a meta analysis is dependent upon the quality of the studies you put in it; garbage in, garbage out. Be clear about exclusion criteria and when discussing generalizability of results 3. Use proper methods to assess potential publication bias, such as the file drawer effect 4. Meta analysis cannot identify causal relationships; it can be used to produce hypotheses around these though because It can detect consistency of a relationship across settings, which is important for causal inference. 5. Perform power calculations prior to meta analysis; sample size is not the only thing that affects power 6. Use the most accurate estimation procedures available even if the resulting estimates are only marginally superior to alternative methods
88
Q

meta analysis: purpose, importance and goals

A

Is important because it is a synthesis of the cumulation of knowledge. The goals of meta are twofold: 1. Estimate the overall strength and direction of an effect or relationship and 2. Estimate the across study variance in the distribution of effect size estimates that explain this variance (see if there are potential moderators). If the effects are consistent then this shows the effect is robust across a range of studies; If there is considerable dispersion, this could mean the effect is context specific. Metas are not simply to provide a summary effect, which is too often what is reported. Degree of heterogeneity in study inclusion is determined by the purpose and specificity of the question the meta is exploring.

89
Q

realist evaluation

A

Answer the why, how, and for whom and in what context questions for interventions through theoretically developing and testing Context + Mechanism = Outcome (CMO)-configurations. A CMO-configuration can pertain to the entire, or parts of, an intervention and one CMO-configuration can be embedded in another. The realist evaluation strategy focuses on three themes: understanding the Mechanisms through which an intervention achieves its Outcomes; understanding the Contextual conditions necessary for triggering these Mechanisms; and understanding Outcome patterns. Some factors in the context may enable certain mechanisms to trigger intended outcomes and therefore interventions cannot simply be transferred from one context to the other; there is always an interaction between context and mechanisms and it is this interaction that creates the intervention’s outcomes The interplay between participants in the intervention and the structures in which the intervention is embedded determines the outcomes of the intervention and research should thus focus on how these agent-structure interactions produce outcomes

90
Q

criticisms of RCT for org interventions and what the proposed alternative is

A

internal validity is hard to achieve because RCT doesn’t demonstrate whether an effect is due to the intervention itself or to other factors. For org interventions we need to focus more on context to figure out what works for who and under what circumstances. examine the content and processes through which interventions are effective, the conditions that trigger this to get a better understanding of how interventions work, called realist evaluation. Meta analyses are often used to look at aggregated intervention research but this is not ideal because it implies that intervention processes are uniform. Assuming that contextual variables are confounds isn’t a good approach, a context process outcome evaluation framework is a better way to advance our understanding of intervention effectiveness. With realist evaluation, interventions are said to work through the employees choice to change their behaviors, not the intervention itself. The behaviors produce the outcome rather than the intervention itself; shared understanding of changes that need to be made is needed between employees and managers. diverging interpretations impact on the outcomes of the intervention, Realist evaluators argue that the impact of interventions cannot be determined without understanding the impact of individual perceptions and behaviours.

91
Q

things that affect tests for moderation

A

Range restriction: when the sample variance is less than population variance, the statistical power for detecting moderating effects is diminished. If its not feasible to get a full range of possible scores from your sample, the estimated population variance should be provided to rule out range restriction as possible alternative explanation for your results. Unequal sample sizes across moderator based categories: when the moderator is categorical in nature the sample size needs to be equal across these groups Artificially dichotomizing continuous moderating variables: loss of information reducing variance of the moderator. Don’t do this

92
Q

what is moderated mediation?

A

The path that constitutes a mediated model varies according to the level of a moderator variable

93
Q

item discrimination: why would indices be low? Negative?

A

Low: item bias or item too easy Negative: not measuring what rest of exam is measuring, or item wording issues

94
Q

types of measurement error

A

random response: caused by variations in attention, etc., occurs within occasions of measurement and reduces the value of all reliability estimates Transient: occurs across occasions; caused by variations across people in mood or mental state. Estimated by correlating across responses (days) and comparing this to an estimate of the within occasion correlation. The difference is thus the estimate of the proportion of total variance that is transient. Specific factor error: reflected in factors specific to certain items or scales (or raters) that are consistent over time but not a part of the construct being measured. It’s the product of an interaction between examinees or ratees with items, scales, or raters

95
Q

How is the performance domain different from the predictor domain?

A

They are conceptually distinct in that the universe to be sampled is delineated differently. Construct domains in the predictor side leverage theoretical frameworks that generalize behaviors. Performance domains are determined/influenced by org decision makers and selection people collaborate to translate broad org objectives into something that can be measured by behaviors. Additionally, the sources of covariance are different. For predictors, this is naturally occurring but for performance behaviors this covariance is induced outside of the individual’s control.

96
Q

MTMM

A

The MTMM matrix is a procedure to establishing construct validity. It requires the application of at least two different methods in measuring at least two different traits. The construct validity for each other traits is determined by discriminant and convergent validation. The matrix tells you about the adequacy of a construct and the relationships between constructs. It helps to identify and develop a nomological network. Measures of similar constructs using different methods should be large and significant for convergent validity.

97
Q

criterion models

A

Criterion models: neglected in terms of both theory and empirical testing. Models include ultimate criterion model and multiple criterion model; obtained the same way, but in ultimate all criteria is combined in to linear composite that reflects overall success. Multiple focuses on collecting multiple criteria of performance and determining the dimensionality and an understanding of the criteria.

98
Q

RWA

A

Identifies relative contribution of each predictor to the criterion by resolving issues of multicollinearity involved in regular analyses. It transforms variables to create and estimate weights for each predictor assuming that they’re orthogonal (uncorrelated). RWA is more sophisticated and allows for a stronger, more accurate estimate of what each variable is accounting for. Results are standardized betas free of multicollinearity. These betas can then be pushed back into the original analysis to predict outcome measures more accurately.

99
Q

Hinkin 1998 method to scale development

A
  1. Item generation (either deductive or inductive) 2. Questionnaire administration: administer your scale + other established measures (which theory dictates) to examine nomological network, also to be used to provide preliminary criterion, convergent, discriminant validity evidence and hence construct validity of your scale 3. Initial item reduction: EFA, 200 people needed; also examine internal consistency reliability for each of the new scales from the EFA 4. CFA: collect data on new independent sample using new measure 5. Convergent/Discriminant validity: Use the MTMM matrix 6. Replication: new sample if possible, continue collecting evidence
100
Q

alpha beta gamma change; what are they, and how do you evaluate them?

A

alpha = absolute, real change Beta = change that is a result of respondent’s subjective recalibration of the construct being measured; it has to do with how they perceive the measurement scale and it’s intervals. the way they conceptualize the measurement intervals or the construct “yard stick” has changed. Gamma = change as a result of the respondent’s reconceptualization of the construct measured. it has to do with how they perceive the actual construct. maybe they have a better grasp of what the construct actually is. ABG change is especially relevant to self report surveys in OD interventions. to evaluate you examine and compare the factor structures of the pre vs post test measures; if these are highly similar, you most likely do not have gamma or beta. if they are different, this implies beta or gamma change took place, which puts into question the true effects of the intervention.

101
Q

Bayesian models

A

Is a proposed alternative to NHST. A Bayes factor is the degree to which observed data should either strengthen or weaken the credibility of a hypothesis in comparison to another. They are especially helpful for directional hypothesis comparison to null hypothesis comparison, hence the idea that they could be a good alternative to NHST. The inferences made based off of Bayesian models are more accurate in that it can account for information already known about a particular effect.

Bayes theorem basically focuses on how probabilities should be updated given new information, and its a mathematical way of doing this. It provides information on the believability of each potential value of a parameter, providing more accurate conclusions. Summarizing it can be difficult despite how informative it is, but has become easier with technology. Growing in its use due to the limitations of traditional data analysis approaches.

102
Q

When would the mean value in a meta might not reflect the target population?

A
  1. Measurement error or range restriction influences the test statistic values (can correct for this)
  2. When moderating effects are present
  3. When only a small number of studies is available
103
Q

general steps for conducting meta analysis

A
  1. Specify what is to be studied; but important to balance precision with practicality because # of studies decreases as you get more specific
  2. Look for studies: this is where publication bias becomes a problem for metas. Look for unpublished studies and docs such as dissertations and reports
  3. Establish criteria list for inclusion: could be type of test, procedure, publication data, anything important.
  4. Calculate test statistic: r for variables, d for groups
  5. Mathematically summarize findings: mean of test statistic is found first that involves weighting them according to sample sizes of studies (correct for sampling error). Next is to see if moderator effect is present, look at variability of values.
104
Q

how are moderating effects detected in meta analysis?

A

Hunter and Schmidt 75% rule: the variance expected to occur from sampling error and other artifacts is computed and compared with the actual (observed) variance across studies. If the expected variance is at least 75% as large as the observed variance, moderators do not exist or have minimal influence. If not, one or more moderators exist. If yes, studies are categorized and a new meta is conducted for each one separately.

105
Q

IRT vs. CTT

A

IRT = stronger assumptions equates to stronger findings, this is especially true for the way that error is treated.

IRT = difficulty and ability are scaled on the same metric. This means you can compare a person’s ability and the difficulty of an item meaningfully.

For CTT the findings hold only for the sample its used on; in IRT the parameters aren’t sample or test dependent. This enables CAT to be applied using IRT.

in CTT a common estimate of the measurement precision is used that is assumed to be equal across individuals regardless of their ability level; in IRT measurement precision depends on the ability level. This means that there are differences between CTT and IRT in their conclusions about ability level

IRT is based on the test taker’s performance level to determine overall theta. IRT does not assume that all items are equally difficult; ICCs are actually used to scale items. It is better for high stakes tests. It is more mathematically complex than CTT. Focuses on the items themselves vs. CTT which focuses on the test as a whole; in IRT a response model is generated for each item.

IRT is based on probability; specifically the probability of endorsing an item is a mathematical function of ability level and item parameters. IRT parameters are: difficulty, discrimination, guessing.

IRT is a latent trait model. Latent, meaning that item responses are the observable manifestations of some trait or ability, which is inferred by response patterns.

Why is IRT better?

IRT = greater flexibility, more information, information is more accurate; the math is more complex; reliability of the test is improved

IRT = enables the use of CAT

IRT = acknowledges that precision isn’t constant across the entire range of test scores; scores at the outskirts of the test’s range have more error than scores closer to the middle of the range.

IRT = item and test information replace reliability; information is a function of the model parameters. More information = less error in measurement

Treatment of measurement error, indexed by the SEM; IRT allows error to vary for examinees, where CTT assumes that the error is the same for each person.

Both theories assume that there is a true score, an observed score, and error; but these are treated differently in each.

106
Q

compare/contrast ANOVA vs. Regression (general)

A

similarities:

both predict contious outocme

anova special case of regression w categorical predictors

the same mathematically, both use GLM

OLS with categorical (dummy coded) regression = factors in ANOVA

contrast:

F test vs. using r squared

OLS does not have to be used in ANOVA but can be

SS in anova = between and within groups; SS in regression = model SS and error SS

different applications; anova best used for experimental designs, regression best used to establish relationships and predict future outcomes

107
Q

compare and contrast quasi experimental and true experimental designs. when would you use each type of design?

A

Experimental vs. Quasi experimental design

Similarities: both used to examine cause and effect phenomena

Study participants subjected to a certain treatment/condition

An outcome of interest is measured, and the researcher tests whether differences in this outcome are due to the treatment

Differences:

Quasi does not have random assignment, they usually are pre existing groups. In true experiment, there is random assignment to conditions.

Quasi: the treatment and control groups often differ in terms of the treatment they receive, but also on other unknown or unknowable factors. Thus, the researcher should try to statistically control for as many confounds as possible.

Because quasi lacks control, there may be several alternative explanations for observed results

Key components of true experimental designs:

Manipulation of predictor variables: the researcher manipulates a factor that is believed to affect the outcome of interest; this is the treatment and levels of a treatment

Random assignment: all have same chance of being put into given condition, this neutralizes factors other than the IV and DV, which makes it possible to infer causality.

Random sampling: doesn’t always happen in practice but problems with generalizability when this isn’t the case

The primary disadvantage of experimental designs is the trade off that occurs with external validity due to the artificial nature of the study setting. It doesn’t reflect what happens in the real world.

108
Q

types of validity in research and their threats

A

Statistical conclusion validity: whether an effect was observed in an experiment, regardless of how the effect was produced. This has to do with the proper use of different statistical tests to answer this question.

Threats: unreliability of measures: in either the IV or DV affect the ability to detect an effect. Reliability affects outcomes of statistical analyses.

Extraneous variables in the experimental setting: any uncontrolled factors in the setting that can produce error variance that interferes with effect detecting.

Internal validity: can a causal relationship be interpreted based on how the variables were manipulated.

Threats: attrition: when participants drop out or refuse to complete the study. For example if one condition of training was too hard people in that condition may drop out, obscuring the contribution of the IV.

History: events that occur begin pre and post test that could produce an observed effect.

Testing: influence of test exposure on a subsequent test; issue for pre post test designs

The above two types of validity address the issue of experimental control.

Construct validity: validity about inferences of the constructs measured in the study.

Threats: quality of operational definitions: such as not including important aspects of a construct within the definition, or construct confounding when something important is not controlled for and has an influence on the measured constructs.

Interaction of treatment with experimental arrangements; such as novelty and disruption effects (the treatment is new or novel) or reactivity to the experimental situation (people’s perceptions of the environment), or experimenter effects (the participants catch on to what the true nature of the study is and respond according to these expectations)

potential threats to construct validity should be addressed and minimized at the outset of experimental planning.

External validity: do the cause and effect inferences hold across persons, settings, etc.

Threats: sampling strategy: since random sampling is usually not feasible, participants may not be truly representative of the population they are meant to represent.

Context dependent mediation is where a variable that mediates a causal relationship in one context or setting differs from mediations in the relationship in a different setting.

addressed through systematic replication.

External and construct pertain to the generalizability of inferences.

109
Q

solomon four group design

A

introduces 4 conditions

  1. Pre test, treatment, post test
  2. Pre test, no treatment, post test
  3. Treatment, post test
  4. No treatment, post test

The effectiveness of treatment can be determined by comparing groups 1 and 2 and comparing groups 3 and 4.