DONE: Methods Flashcards
Cortina (1993)
addressing the confusion around coefficient (Cronbach’s) alpha - which is a measure of internal consistency.
Alpha can be in different forms (Cronbach’s or standardized) and is a function of the number of items in a scale ( and this SHOULD be considered while interpreting alpha!).
Perhaps the main point:
alpha is a sound measure of proportion of error variance regardless of test length, just not always a good measure of internal consistency, because when many items are pooled, internal consistency estimates are relatively invariant (i.e., large) and therefore somewhat useless.
One other trick is, there’s often a level of alpha, AND a PRECISION of alpha. The precision is a function of the variance of the item intercorrelations.
And Cronbach’s alpha is an estimate of reliability - not reliability per say - because as the items in tests approach essential tau-equivalence, alpha approaches reliability
error variance vs internal consistency, and it’s an estimate of reliability, not reliability per say
Cronbach’s alpha (1951) is an effective measure of error variance regardless of test length, but not always a good measure of internal consistency, as internal consistency of a scale is strongly correlated with the number of items in that scale.
Coefficient alpha (Cronbach, 1951) is certainly one of the most important and pervasive statistics in research involving test construction and use, and is found in experimental psychology, sociology, statistics, medicine, counseling, nursing, economics, political science, criminology, gerontology, broadcasting, anthropology, and accounting.
The estimate of reliability that one uses must depend on the sources of variance that one considers relevant. If error factors associated with the passing of time are of interest, then test-retest or multiple administrations of parallel tests may be used.
If error factors associated with the use of different items are of interest, then internal consistency estimates, such as coefficient alpha (which takes into account variance attributable to subjects and variance attributable to the interaction between subjects and items).
Internal consistency refers to the degree of interrelatedness among the items whereas homogeneity refers to unidimensionality.
In one example, alpha was high in spite of the fact that one third of the item intercorrelations were zero. So, one conclusion that can be drawn with respect to what alpha measures is this: It is a function of the extent to which items in a test have high communalities and thus low uniquenesses.
It was mentioned above that a set of items can be somewhat interrelated and multidimensional. This is not so much an issue for the level of alpha, but rather for the precision of alpha.
Precision is measured in terms of the standard error of item intercorrelations, which, in turn, is a function of the variance of the item intercorrelations. The STD Error of alpha gives the precision.
A large standard error, although it does not provide enough information by itself to prove multidimensionality, is a symptom of multidimensionality.
Cronbach’s alpha is an estimate of reliability - not reliability per say - because as the items in tests approach essential tau-equivalence (i.e., linearly related and differing only by a constant), as they do when the tests are composed of equal portions of general and group factor variance, Cronbach’s alpha approaches reliability.
Most recent studies that have used alpha imply that a given level, perhaps greater than .70, is adequate or inadequate without comparing it with the number of items in the scale. THIS IS BAD BC:
simply having many items can drive up alpha!
This is not to say that the absolute level of alpha is meaningless. The proportion of error variance for a scale with alpha = .80 is exactly the same for any test regardless of the number of items.
Third, the precision of alpha (the standard error of the correlations in the item intercorrelation matrix) offers far more information about dimensionality than the size of alpha.
a scale can have a reasonable alpha even if it contains three orthogonal dimensions. To be fair, though, alpha does increase as a function of item intercorrelation, and alpha does decrease as a function of multidimensionality. A LARGE alpha does not mean you have a unidimensional test!
Lee et al. (2011)
What is qualitative research? 4 THINGS:
First, it occurs in natural settings (e.g., not as much limited ecological validity as with experiments conducted with substantial control in the lab). Generally speaking.
Second, qualitative data derive from the participants’ perspective. Typically, the researcher should not impose immediate interpretations. After substantial analysis, however, theoretical inductions (e.g., grounded theory) and/or particular interpretations (e.g., critical theory) become quite legitimate.
Third, qualitative research should be reflexive (i.e., flexible). Qualitative designs can be readily changed to fit the fluid or dynamic demands of the research setting. In our view, this attribute most easily differentiates qualitative from traditional quantitative research, which might be characterized as more rule driven or algorithmic (e.g., experiments, survey research).
Fourth, qualitative instrumentation, observation methods, and modes of analysis are not standardized. Instead, the individual psychologist serves as the main research instrument. (Although many recent and impressive software packages greatly facilitate data analysis, these packages do not substitute—yet—for the qualitative researcher’s insight and inductive reasoning.) —-But with quantitative analysis, it can easily take place in lab, field settings, or both. And it more commonly imposes a particular theoretical perspective thru which to understand data. And have strong consistency standards (e.g., experimental controls) to eliminate alt explanations. And requires more rigorous reliability & validity standards.
Lee et al. (2011)
two major themes characterize much of the published qualitative research.
First, it is often a process of data reduction that simultaneously enhances and leads to inductions about theory. In other words, a vast amount of subjectively gathered data (e.g., context-specific observations) are logically but consistently reduced to enhance interpretive coherence.
Second, and noted previously, qualitative research involves few, if any, standardized, well-researched, or otherwise objectively observable instruments.
By analogy, Qualitative is like EFA: Both take large amounts of data and reduce them to a more meaningful whole. With EFA, latent traits are often inferred, but with qualitative, coherent categories are often induced.
QUAL can still be used for Hypothesis testing: Lee, et al. (1996) tested specific hypotheses deduced from the unfolding model of voluntary turnover (Lee & Mitchell, 1994) on a sample of nurses. In particular, the unfolding model holds that turnover occurs through four distinct prototypical paths.
Three common types of QUAL designs:
(Lee et al., 2011)
Case study research, ethnography, and in-depth interview studies are the THREE MAIN TYPES OF RESEARCH DESIGNS
Case study research
necessarily seeks to generate, elaborate, or test scientific theories and models. Second, case study research can occur within an in-depth investigation of a single-case situation (e.g., the common “N = 1 situation”) or across in-depth investigations of multiple cases - third, most often seeks to generate testable research propositions.
Lee et al. (2011) - 3 Limitations on qualitative research
First, it is very time intensive. As scholars, we have multiple demands on our time, and qualitative research may simply require too much time, especially in the early years of one’s career.
- It may be professionally risky, bc the time consumption of it may slow other research projects, AND bc qualitative research isn’t as conventional as QUANT, it may be more challenging to publish.
Third, some organizational scholars claim we have too much theory. As mentioned earlier, the vast majority of qualitative research generates or elaborates theory.
More theory testing through qualitative means, however, might help counter this issue
May be hard to apply in some areas it’s not as common in - Ex. training or performance appraisal, have been examined mostly with quantitative procedures.
Spector (2019)
There are two often expressed concerns with the cross-sectional design: common method variance and the inability to draw causal conclusions. The remedy for both of these concerns most often suggested is using a longitudinal (all variables are assessed at all time points) or prospective (different variables are assessed at different time points) design to introduce the element of time.
Spector (2019)
The unmeasured variables problem for example such as in models where mood is not included. (e.g., finding perception of supervisor supervision correlated with job sat and not knowing it’s bc mood increases both of those variables). = NOT a problem of common method variance.
If, however, mood has no impact on the underlying constructs, but merely affects their assessment, mood would be biasing those assessments and would serve as a source of common method variance, that is, an unintended influence on the assessment of the variables of interest.
AND Comparisons of corresponding cross-sectional versus longitudinal correlations in meta-analyses do not uniformly find larger correlations from cross-sectional designs (e.g., Nixon et al., 2011)
J. Stuart Mill’s necessary requirements for causation
- Proposed cause and effect are related. 2. Proposed cause occurs prior to effect. 3. We can rule out feasible alternative explanations for observations of 1 and 2.
AND a fourth element of the need to articulate a mechanism through which the cause can lead to an effect. 4. Proposed cause works through an articulated mechanism. - is easy to find thru cross-sectional or longitudinal
- not actually obv from longitudinal - Although a longitudinal design can provide a measurement of X before a measurement of Y, that is not the same thing as assessing X prior to Y happening and Y after X has occurred. This is bc our studies rarely assess discrete events - We do not generally know when the levels of our X and Y variables were achieved, and which might have occurred prior to the other.
We will fail to detect the effects of X on Y because the process has not been completed. To provide an adequate test of the effects of X on Y, we would need to know how long the lag is between X occurring and Y happening, say starting to smoke and then developing lung cancer.
For example:
If we wish to study the connection between workload and strain, we need to look at their concurrent levels. If we conduct a cross-lag analysis with our two-wave study, we would be testing whether October workload predicts April strain, which in this case is nonsensical because the lag between workload and strain is not 6 months.
Merely choosing some arbitrary time points does not provide more definitive evidence than does a cross-sectional design and might well lead to erroneous inference as in the accountant example.
The use of a cross-sectional design would be inappropriate in cases where equilibration (X affecting Y) has not yet occurred, for example, if we assess X and Y at the same time, but Y has not yet happened. For example, if we want to study the effects of smoking on heart disease, it would not be fruitful to conduct a cross-sectional study of 20-year-olds because there has not been sufficient time for young smokers to have developed the disease.
Even randomized experiments are plagued with potential demand characteristics and experimenter effects, and limitations to generalizability, not to mention uncertainty about how well the intervention wiggled^ the intended X variable.
Qualitative methods might include focus groups, interviews, or surveys where people are asked to recall specific events and when they occurred. With interviews, in-depth discussions can include not only the order of events, but also informants’ explanations and interpretations of events that can be helpful in articulating tentative mechanisms deserving of further testing.
An advantage of using an alternative source is that it can serve as a control for some sources of method variance (Podsakoff et al., 2012). Yet, different sources can share biases when those sources are in contact with one another, such as employees and their supervisors (Spector, 2006).
The main issue in deciding to include an alternative source is whether the interest is in only the subjective view of the individual, or in a more objective feature of the person or workplace. Some phenomena concern internal states of people that would be difficult for an alternative source to assess anyway.
SUPER INTERESTING: Experimental Approaches Cross-sectional studies can be experimental in that X and Y are assessed under different conditions to rule out the effects of potential third variables that could be manipulated. For example, if one suspects that mood might serve as a common cause of both X and Y, one could measure X and Y under varying conditions of mood. Thus, in one condition, you do a mood manipulation and measure X and Y correlation, but in another condition you don’t do mood manip & still measure X&Y correlation
Spector (2019) - when to use cross sectional
Exploratory research
Testing new relationships, or old relationships in new contexts (moderators)
When you don’t know the proper lag times for longitudinal.
A cross-sectional design could be utilized where the X occurred prior to the survey and is assessed with retrospective questions in the survey. Finding differences on Y for people who experienced versus did not experience a merger has its limitations, but it can provide hints that mergers might have long-lasting effects and that such effects are worthy for further study.
You are interested in ruling out alternative explanations for covariation. Example - if the relationship between x and y remains the same whether including b as a control variable or not, you can discern that b is not a good alternative explanation.
Spector (2019) -When to use longitudinal:
When to use Longitudinal:
- You wish to test the effects of an intervention - Get baseline measure and then to assess the outcome one or ideally multiple times after the intervention to provide insights into how the effects of the intervention unfold over time ( are they transitory or long-lasting?).
- You wish to test the effects of an experience that occurs between waves.
- -Discrete events can be good candidates for longitudinal investigation, particularly if you can compare individuals who have and have not experienced the events prior to the study. (there’s less vacillation in discrete events, so you don’t have to worry about the problem of arbitrary time points). - particularly if you can compare individuals who have and have not experienced the discrete event prior to the study.
- You know how long the time lag will be between the X and Y variables.
Spector (2019) - When you do cross-sectional, make it strong:
WHEN YOU DO CROSS-SECTIONAL, MAKE IT STRONG:
- Present a systematic analysis strategy that tells a compelling story. This might mean first establishing a clear relationship between X and Y, and then ruling out feasible alternative explanations, and/or illustrating boundary conditions through the use of moderator tests.
- If possible, incorporate a time element into the design. This could involve a retrospective event history
- Or, it is possible to compare people who had versus did not have some experience in the past to see if it is related to an important variable at the present time. Even though this is a concurrent measure of all variables, it links what happens in the past (assuming people can accurately report, so ideally a significant event like being fired) with something in the present.
surveys can be designed to ask people for their judgments about the causes of events. Those judgments can be checked for consensus and compared to other forms of evidence to build a causal case.
Murphy & Aguinis (2019)
HARKING -
The reality that a large number of correlations are in the 0.10s means that a HARKing effect of about 0.10 correlation units could essentially double the observed effect. In other words, there is a potential for 100% inflation in the observed effect when the population effect is itself small.
As N decreases and the standard error increases, the likelihood that the most favorable study result (i.e., statistically significant and large) will deviate sharply from the population effect it estimates increases, meaning that there should be systematically more bias when N is small.
When HARKing involves cherry-picking, which consists of searching through data involving alternative measures or samples to find the results that offer the strongest possible support for a particular hypothesis or research question, HARKing has only a small effect on estimates of the population effect size.
When HARKing involves question trolling, which consists of searching through data involving several different constructs, measures of those constructs, interventions, or relationships to find seemingly notable results worth writing about, HARKing produces substantial upward bias particularly when it is prevalent and there are many effects from which to choose.
Less problematic forms of HARKING—little potential to bias cumulative knowledge:
- Hypothesis proliferation: An author adds hypotheses to a study after data are collected and analyzed to place added emphasis on a result that was not part of the original conceptual design but was nevertheless going to be reported in the manuscript (e.g., correlation table).
- THARKing: An author is likely to transparently HARK in the discussion section of a paper by forming new hypotheses on the basis of results obtained (Hollenbeck & Wright, 2017). - For example, authors may describe in an article’s discussion section that particular hypotheses were based on the data they collected for their study. Hollenbeck and Wright (2017) argued that THARKing is not only ethical, but also likely to be beneficial.
As N decreases and the standard error increases, the likelihood that the most favorable study result (i.e., statistically significant and large) will deviate sharply from the population effect it estimates increases, meaning that there should be systematically more bias when N is small.
Prob all you need from this paper.
Cherry-picking involves selectively reporting the most favorable results from different samples or measures that are all designed to address the same research question, which implies that the population effect all of these different sample results are designed to estimate is the same. That is, cherry-picking involves homogeneous effects. BUT… In contrast to cherry-picking, question trolling involves heterogeneous effects because even the choice of what to study is driven by sample results in which the underlying population effects are heterogeneous, because it’s a mixed bag they’re grabbing from.
Cherry-picking’s impact is generally small. Except when HARking is very prevalent and the sample size is small, cherry-picking results have a small biasing impact on effect size estimates.
There are several actions that can be taken to minimize the detrimental effects of HARKing. For example, large sample sizes reduce the bias produced by cherry-picking.
Bedeian et al. (2010)
Over 90% of respondents in Bedeian et al.’s (2010) survey of research practioners in management indicated they had knowledge of faculty members who had developed hypotheses after results were known
Murphy & Aguinis (2019)
To prevent/deter HARKING
First, Occam’s razor is an essential tool for detecting HARKing. As Hollenbeck and Wright (2017) noted, HARKed hypotheses often involve convoluted reasoning or counterfactual assumptions.
Second, it is useful to have a healthily skeptical attitude; stories that are too good to be true may not be true. - It is unusual for every prediction and every hypothesis to be supported in a study.
HARKing is not simply a problem of author misbehavior. It is common for reviewers and editors to encourage authors to drop hypotheses and analyses that do not pan out, and this creates problems that have a good deal in common with HARKing.
Murphy & Aguinis (2019) - On how different models are similar to harking
Multiple regression has possible similarities to HARKING:
-Many variable selection methods available to researchers, including forward selection algorithms (where variables are added to a regression model until they fail to lead to incremental increases in R2 ).
Serious concerns have been raised about the effects of these building prediction models on the basis of statistical criteria, including fluctuations in data or poor replicability.