DONE: Methods Flashcards by Modern-Day Debate

Cortina (1993)

addressing the confusion around coefficient (Cronbach’s) alpha - which is a measure of internal consistency.

Alpha can be in different forms (Cronbach’s or standardized) and is a function of the number of items in a scale ( and this SHOULD be considered while interpreting alpha!).

Perhaps the main point:
alpha is a sound measure of proportion of error variance regardless of test length, just not always a good measure of internal consistency, because when many items are pooled, internal consistency estimates are relatively invariant (i.e., large) and therefore somewhat useless.

One other trick is, there’s often a level of alpha, AND a PRECISION of alpha. The precision is a function of the variance of the item intercorrelations.

And Cronbach’s alpha is an estimate of reliability - not reliability per say - because as the items in tests approach essential tau-equivalence, alpha approaches reliability

error variance vs internal consistency, and it’s an estimate of reliability, not reliability per say

Cronbach’s alpha (1951) is an effective measure of error variance regardless of test length, but not always a good measure of internal consistency, as internal consistency of a scale is strongly correlated with the number of items in that scale.

Coefficient alpha (Cronbach, 1951) is certainly one of the most important and pervasive statistics in research involving test construction and use, and is found in experimental psychology, sociology, statistics, medicine, counseling, nursing, economics, political science, criminology, gerontology, broadcasting, anthropology, and accounting.

The estimate of reliability that one uses must depend on the sources of variance that one considers relevant. If error factors associated with the passing of time are of interest, then test-retest or multiple administrations of parallel tests may be used.

If error factors associated with the use of different items are of interest, then internal consistency estimates, such as coefficient alpha (which takes into account variance attributable to subjects and variance attributable to the interaction between subjects and items).

Internal consistency refers to the degree of interrelatedness among the items whereas homogeneity refers to unidimensionality.

In one example, alpha was high in spite of the fact that one third of the item intercorrelations were zero. So, one conclusion that can be drawn with respect to what alpha measures is this: It is a function of the extent to which items in a test have high communalities and thus low uniquenesses.

It was mentioned above that a set of items can be somewhat interrelated and multidimensional. This is not so much an issue for the level of alpha, but rather for the precision of alpha.

Precision is measured in terms of the standard error of item intercorrelations, which, in turn, is a function of the variance of the item intercorrelations. The STD Error of alpha gives the precision.

A large standard error, although it does not provide enough information by itself to prove multidimensionality, is a symptom of multidimensionality.

Cronbach’s alpha is an estimate of reliability - not reliability per say - because as the items in tests approach essential tau-equivalence (i.e., linearly related and differing only by a constant), as they do when the tests are composed of equal portions of general and group factor variance, Cronbach’s alpha approaches reliability.

Most recent studies that have used alpha imply that a given level, perhaps greater than .70, is adequate or inadequate without comparing it with the number of items in the scale. THIS IS BAD BC:
simply having many items can drive up alpha!

This is not to say that the absolute level of alpha is meaningless. The proportion of error variance for a scale with alpha = .80 is exactly the same for any test regardless of the number of items.

Third, the precision of alpha (the standard error of the correlations in the item intercorrelation matrix) offers far more information about dimensionality than the size of alpha.

a scale can have a reasonable alpha even if it contains three orthogonal dimensions. To be fair, though, alpha does increase as a function of item intercorrelation, and alpha does decrease as a function of multidimensionality. A LARGE alpha does not mean you have a unidimensional test!

How well did you know this?

Not at all

Perfectly

Lee et al. (2011)

What is qualitative research? 4 THINGS:

First, it occurs in natural settings (e.g., not as much limited ecological validity as with experiments conducted with substantial control in the lab). Generally speaking.

Second, qualitative data derive from the participants’ perspective. Typically, the researcher should not impose immediate interpretations. After substantial analysis, however, theoretical inductions (e.g., grounded theory) and/or particular interpretations (e.g., critical theory) become quite legitimate.

Third, qualitative research should be reflexive (i.e., flexible). Qualitative designs can be readily changed to fit the fluid or dynamic demands of the research setting. In our view, this attribute most easily differentiates qualitative from traditional quantitative research, which might be characterized as more rule driven or algorithmic (e.g., experiments, survey research).

Fourth, qualitative instrumentation, observation methods, and modes of analysis are not standardized. Instead, the individual psychologist serves as the main research instrument. (Although many recent and impressive software packages greatly facilitate data analysis, these packages do not substitute—yet—for the qualitative researcher’s insight and inductive reasoning.) —-But with quantitative analysis, it can easily take place in lab, field settings, or both. And it more commonly imposes a particular theoretical perspective thru which to understand data. And have strong consistency standards (e.g., experimental controls) to eliminate alt explanations. And requires more rigorous reliability & validity standards.

How well did you know this?

Not at all

Perfectly

Lee et al. (2011)

two major themes characterize much of the published qualitative research.

First, it is often a process of data reduction that simultaneously enhances and leads to inductions about theory. In other words, a vast amount of subjectively gathered data (e.g., context-specific observations) are logically but consistently reduced to enhance interpretive coherence.

Second, and noted previously, qualitative research involves few, if any, standardized, well-researched, or otherwise objectively observable instruments.

By analogy, Qualitative is like EFA: Both take large amounts of data and reduce them to a more meaningful whole. With EFA, latent traits are often inferred, but with qualitative, coherent categories are often induced.

QUAL can still be used for Hypothesis testing: Lee, et al. (1996) tested specific hypotheses deduced from the unfolding model of voluntary turnover (Lee & Mitchell, 1994) on a sample of nurses. In particular, the unfolding model holds that turnover occurs through four distinct prototypical paths.

How well did you know this?

Not at all

Perfectly

Three common types of QUAL designs:

(Lee et al., 2011)

Case study research, ethnography, and in-depth interview studies are the THREE MAIN TYPES OF RESEARCH DESIGNS

How well did you know this?

Not at all

Perfectly

Case study research

necessarily seeks to generate, elaborate, or test scientific theories and models. Second, case study research can occur within an in-depth investigation of a single-case situation (e.g., the common “N = 1 situation”) or across in-depth investigations of multiple cases - third, most often seeks to generate testable research propositions.

How well did you know this?

Not at all

Perfectly

Lee et al. (2011) - 3 Limitations on qualitative research

First, it is very time intensive. As scholars, we have multiple demands on our time, and qualitative research may simply require too much time, especially in the early years of one’s career.

It may be professionally risky, bc the time consumption of it may slow other research projects, AND bc qualitative research isn’t as conventional as QUANT, it may be more challenging to publish.

Third, some organizational scholars claim we have too much theory. As mentioned earlier, the vast majority of qualitative research generates or elaborates theory.

More theory testing through qualitative means, however, might help counter this issue
May be hard to apply in some areas it’s not as common in - Ex. training or performance appraisal, have been examined mostly with quantitative procedures.

How well did you know this?

Not at all

Perfectly

Spector (2019)

There are two often expressed concerns with the cross-sectional design: common method variance and the inability to draw causal conclusions. The remedy for both of these concerns most often suggested is using a longitudinal (all variables are assessed at all time points) or prospective (different variables are assessed at different time points) design to introduce the element of time.

How well did you know this?

Not at all

Perfectly

Spector (2019)

The unmeasured variables problem for example such as in models where mood is not included. (e.g., finding perception of supervisor supervision correlated with job sat and not knowing it’s bc mood increases both of those variables). = NOT a problem of common method variance.

If, however, mood has no impact on the underlying constructs, but merely affects their assessment, mood would be biasing those assessments and would serve as a source of common method variance, that is, an unintended influence on the assessment of the variables of interest.

AND Comparisons of corresponding cross-sectional versus longitudinal correlations in meta-analyses do not uniformly find larger correlations from cross-sectional designs (e.g., Nixon et al., 2011)

J. Stuart Mill’s necessary requirements for causation

Proposed cause and effect are related. 2. Proposed cause occurs prior to effect. 3. We can rule out feasible alternative explanations for observations of 1 and 2.
AND a fourth element of the need to articulate a mechanism through which the cause can lead to an effect. 4. Proposed cause works through an articulated mechanism.
is easy to find thru cross-sectional or longitudinal
not actually obv from longitudinal - Although a longitudinal design can provide a measurement of X before a measurement of Y, that is not the same thing as assessing X prior to Y happening and Y after X has occurred. This is bc our studies rarely assess discrete events - We do not generally know when the levels of our X and Y variables were achieved, and which might have occurred prior to the other.

We will fail to detect the effects of X on Y because the process has not been completed. To provide an adequate test of the effects of X on Y, we would need to know how long the lag is between X occurring and Y happening, say starting to smoke and then developing lung cancer.

For example:
If we wish to study the connection between workload and strain, we need to look at their concurrent levels. If we conduct a cross-lag analysis with our two-wave study, we would be testing whether October workload predicts April strain, which in this case is nonsensical because the lag between workload and strain is not 6 months.

Merely choosing some arbitrary time points does not provide more definitive evidence than does a cross-sectional design and might well lead to erroneous inference as in the accountant example.

The use of a cross-sectional design would be inappropriate in cases where equilibration (X affecting Y) has not yet occurred, for example, if we assess X and Y at the same time, but Y has not yet happened. For example, if we want to study the effects of smoking on heart disease, it would not be fruitful to conduct a cross-sectional study of 20-year-olds because there has not been sufficient time for young smokers to have developed the disease.

Even randomized experiments are plagued with potential demand characteristics and experimenter effects, and limitations to generalizability, not to mention uncertainty about how well the intervention wiggled^ the intended X variable.

Qualitative methods might include focus groups, interviews, or surveys where people are asked to recall specific events and when they occurred. With interviews, in-depth discussions can include not only the order of events, but also informants’ explanations and interpretations of events that can be helpful in articulating tentative mechanisms deserving of further testing.

An advantage of using an alternative source is that it can serve as a control for some sources of method variance (Podsakoff et al., 2012). Yet, different sources can share biases when those sources are in contact with one another, such as employees and their supervisors (Spector, 2006).

The main issue in deciding to include an alternative source is whether the interest is in only the subjective view of the individual, or in a more objective feature of the person or workplace. Some phenomena concern internal states of people that would be difficult for an alternative source to assess anyway.

SUPER INTERESTING: Experimental Approaches Cross-sectional studies can be experimental in that X and Y are assessed under different conditions to rule out the effects of potential third variables that could be manipulated. For example, if one suspects that mood might serve as a common cause of both X and Y, one could measure X and Y under varying conditions of mood. Thus, in one condition, you do a mood manipulation and measure X and Y correlation, but in another condition you don’t do mood manip & still measure X&Y correlation

How well did you know this?

Not at all

Perfectly

Spector (2019) - when to use cross sectional

Exploratory research
Testing new relationships, or old relationships in new contexts (moderators)
When you don’t know the proper lag times for longitudinal.
A cross-sectional design could be utilized where the X occurred prior to the survey and is assessed with retrospective questions in the survey. Finding differences on Y for people who experienced versus did not experience a merger has its limitations, but it can provide hints that mergers might have long-lasting effects and that such effects are worthy for further study.
You are interested in ruling out alternative explanations for covariation. Example - if the relationship between x and y remains the same whether including b as a control variable or not, you can discern that b is not a good alternative explanation.

How well did you know this?

Not at all

Perfectly

Spector (2019) -When to use longitudinal:

When to use Longitudinal:

You wish to test the effects of an intervention - Get baseline measure and then to assess the outcome one or ideally multiple times after the intervention to provide insights into how the effects of the intervention unfold over time ( are they transitory or long-lasting?).
You wish to test the effects of an experience that occurs between waves.
-Discrete events can be good candidates for longitudinal investigation, particularly if you can compare individuals who have and have not experienced the events prior to the study. (there’s less vacillation in discrete events, so you don’t have to worry about the problem of arbitrary time points). - particularly if you can compare individuals who have and have not experienced the discrete event prior to the study.
You know how long the time lag will be between the X and Y variables.

How well did you know this?

Not at all

Perfectly

Spector (2019) - When you do cross-sectional, make it strong:

WHEN YOU DO CROSS-SECTIONAL, MAKE IT STRONG:
- Present a systematic analysis strategy that tells a compelling story. This might mean first establishing a clear relationship between X and Y, and then ruling out feasible alternative explanations, and/or illustrating boundary conditions through the use of moderator tests.

If possible, incorporate a time element into the design. This could involve a retrospective event history
Or, it is possible to compare people who had versus did not have some experience in the past to see if it is related to an important variable at the present time. Even though this is a concurrent measure of all variables, it links what happens in the past (assuming people can accurately report, so ideally a significant event like being fired) with something in the present.

surveys can be designed to ask people for their judgments about the causes of events. Those judgments can be checked for consensus and compared to other forms of evidence to build a causal case.

How well did you know this?

Not at all

Perfectly

Murphy & Aguinis (2019)

HARKING -

The reality that a large number of correlations are in the 0.10s means that a HARKing effect of about 0.10 correlation units could essentially double the observed effect. In other words, there is a potential for 100% inflation in the observed effect when the population effect is itself small.

As N decreases and the standard error increases, the likelihood that the most favorable study result (i.e., statistically significant and large) will deviate sharply from the population effect it estimates increases, meaning that there should be systematically more bias when N is small.

When HARKing involves cherry-picking, which consists of searching through data involving alternative measures or samples to find the results that offer the strongest possible support for a particular hypothesis or research question, HARKing has only a small effect on estimates of the population effect size.

When HARKing involves question trolling, which consists of searching through data involving several different constructs, measures of those constructs, interventions, or relationships to find seemingly notable results worth writing about, HARKing produces substantial upward bias particularly when it is prevalent and there are many effects from which to choose.

Less problematic forms of HARKING—little potential to bias cumulative knowledge:

Hypothesis proliferation: An author adds hypotheses to a study after data are collected and analyzed to place added emphasis on a result that was not part of the original conceptual design but was nevertheless going to be reported in the manuscript (e.g., correlation table).
THARKing: An author is likely to transparently HARK in the discussion section of a paper by forming new hypotheses on the basis of results obtained (Hollenbeck & Wright, 2017). - For example, authors may describe in an article’s discussion section that particular hypotheses were based on the data they collected for their study. Hollenbeck and Wright (2017) argued that THARKing is not only ethical, but also likely to be beneficial.

Prob all you need from this paper.

Cherry-picking involves selectively reporting the most favorable results from different samples or measures that are all designed to address the same research question, which implies that the population effect all of these different sample results are designed to estimate is the same. That is, cherry-picking involves homogeneous effects. BUT… In contrast to cherry-picking, question trolling involves heterogeneous effects because even the choice of what to study is driven by sample results in which the underlying population effects are heterogeneous, because it’s a mixed bag they’re grabbing from.

Cherry-picking’s impact is generally small. Except when HARking is very prevalent and the sample size is small, cherry-picking results have a small biasing impact on effect size estimates.

There are several actions that can be taken to minimize the detrimental effects of HARKing. For example, large sample sizes reduce the bias produced by cherry-picking.

How well did you know this?

Not at all

Perfectly

Bedeian et al. (2010)

Over 90% of respondents in Bedeian et al.’s (2010) survey of research practioners in management indicated they had knowledge of faculty members who had developed hypotheses after results were known

How well did you know this?

Not at all

Perfectly

Murphy & Aguinis (2019)

To prevent/deter HARKING

First, Occam’s razor is an essential tool for detecting HARKing. As Hollenbeck and Wright (2017) noted, HARKed hypotheses often involve convoluted reasoning or counterfactual assumptions.

Second, it is useful to have a healthily skeptical attitude; stories that are too good to be true may not be true. - It is unusual for every prediction and every hypothesis to be supported in a study.

HARKing is not simply a problem of author misbehavior. It is common for reviewers and editors to encourage authors to drop hypotheses and analyses that do not pan out, and this creates problems that have a good deal in common with HARKing.

How well did you know this?

Not at all

Perfectly

Murphy & Aguinis (2019) - On how different models are similar to harking

Multiple regression has possible similarities to HARKING:
-Many variable selection methods available to researchers, including forward selection algorithms (where variables are added to a regression model until they fail to lead to incremental increases in R2 ).

Serious concerns have been raised about the effects of these building prediction models on the basis of statistical criteria, including fluctuations in data or poor replicability.

How well did you know this?

Not at all

Perfectly

Bosco et al. (2015)

Study These Flashcards

We extracted 147,328 correlations and developed a hierarchical taxonomy of variables reported in Journal of Applied Psychology and Personnel Psychology from 1980 to 2010.

Results indicate that the usual interpretation and classification of effect sizes as small, medium, and large bear almost no resemblance to findings in the field, because distributions of effect sizes exhibit tertile partitions at values approximately one-half to one-third those intuited by Cohen (1988).

We may expect that the middle 50% of a distribution is a good range for the upper and lower levels of a medium effect. Bosco et al. (2015) found this range was .09 to .26, which is significantly lower than Cohen’s (1998) appreciably higher range of .3 to .5.

Cohen’s (1988) benchmarks for small, medium, and large ESs (i.e., |r| .10, .30, .50) correspond to approximately the 33rd, 73rd and 90th percentiles, respectively, of our distribution of 147,328 effect sizes.

The mean difference divided by the pooled standard deviation is called Cohen’s d, where d stands for a difference score.

A positively skewed ES distribution was expected because, as noted by Cohen (1988), large effect sizes are relatively rare in social science research, often t

An example of attitude-intention (nonbehavioral) relation that is typically stronger: low job sat correlated with high intentions to turnover. Those nonbehavioral relations are stronger, but behavioral relationships (e.g., low job satisfaction and actual turnover) are lower.

Thus, for heuristic purposes, our results indicate that medium effect sizes involving behaviors (i.e., attitudes– behaviors; intentions– behaviors) are between roughly |r| .10 and .25. In contrast, for relations not involving behaviors (e.g., attitudes–attitudes; attitudes– intentions), medium effect sizes are between roughly |r| .20 and .40.

Thus when an existing estimate is not available, researchers would be better served to specify a typical context-specific ES (e.g., for an attitude– behavior relation) rather than to take a shot in the dark with Cohen’s (1988) benchmarks.

Using Cohen’s (1998) ES standards could lead to upwardly biased ES forecasts and thus underpowered studies (Maxwell, 2004).

Brooks et al., 2014

Study These Flashcards

Results show that nontraditional effect size indicators are perceived as more understandable and useful than traditional indicators for communicating the effectiveness of an intervention.

People also rated training programs as more effective and were willing to pay more for programs whose effectiveness was described using the nontraditional effect size metrics.

The BESD communicates the effectiveness of an intervention by using a table to display the change in success rate that is attributable to an intervention (e.g., showing percent of improvers vs non-improvers for intervention vs control group participants.

The Common language effect size indicator (CLES)

The CLES (McGraw & Wong, 1992) communicates the effectiveness of an intervention by presenting the probability that a score chosen randomly from the intervention group will differ from a score chosen randomly from the control group.

Unfortunately, research on numeracy, the ability to process basic mathematical concepts, has suggested that approximately half of Americans lack the basic skills required to use statistics to properly inform decisions.

Example:
If you attend the Academic Aces GRE Program, there is a 60% chance that you will increase your GRE score more than someone who did not attend the program. (an r of .3 equals to a 60% greater chance).

We cannot say with certainty whether people are overestimating the effects of interventions presented with nontraditional displays or underestimating effects presented with traditional displays.

Future research might approach the accuracy question using context-specific examples and assessing lay and expert convergence.

The importance of context in determining the value of an intervention; context can render small effects meaningful and large effects meaningless.

there is not one proper interpretation of any effect;

Landers & Behrend, 2015

Study These Flashcards

Our goal in this article is thus to explore how, and under which conditions, the convenience samples that are commonly available to I-O psychologists, such as the use of Internet panels, college students, and specific organizations, do or do not harm the external validity of research studies when researchers draw conclusions about the global worker pool.

The use of obvious convenience sampling in particular, is a lightning rod for criticism of study generalizability.

Comments like “This study is weakened by its reliance on a college student sample” are common. This kind of uncritical and nonspecific condemnation is harmful because simple decision rules categorizing particular sources of convenience samples as good or bad unnecessarily limit the types of samples from which researchers are willing to draw (even if they’re acceptable samples).

Pedhazur and Schmelkin (1991) defined external validity as “the generalizability of findings to or across [emphasis added] target populations, setting, times, and the like”

Generalizing to is treated quite narrowly, and the term refers to the ability of a researcher to make valid conclusions about the population from which a particular sample is randomly drawn.

In contrast, generalizing across refers to a researcher’s ability to draw conclusions about a desirable population from a given nonprobability (e.g., convenience samples) sample

One problem with generalization across is: the treatments–attributes interaction. This threat generally refers to any way in which the people studied are different from the population of interest.

External validity is threatened in one of five ways, two of which are relevant to the narrow to broad generalization objective.

First, the causal relationship may interact with particular sample characteristics. An example of this by describing an experiment in which the most highly qualified applicants to a work program were selected to demonstrate the positive effects of that program, enhancing the program’s apparent effectiveness.

Second, there may be **unknown context-dependent moderators. For example, in a study of the effectiveness of self-regulation interventions on web-based learning, students in a lab setting may generally be more willing to obey researcher instructions than an organization’s employees may be when those employees are spending time away from other, more pressing work while engaging in such training.

We have identified a common core in regard to convenience samples and their effects on external validity.

When sampling at random from a population, researchers are able to rely on classical test theory to support the argument that a sample is reasonably representative of a target population.

When convenience sampling, which is the approach of virtually all I-O psychology research, researchers cannot rely on probability alone when making the argument of representativeness.

Instead, convenience sampling involves randomly sampling a convenient population and making a rational argument as to why the convenient population is similar enough to the target pop.

When college students are sampled, an argument should be made that the empirical relationships observed are likely to be similar in the global worker pool. Instead, a few lines in the limitations section (e.g., “These results may not generalize due to the use of a college student sample.”) are usually the extent of this argument.

Yet, There is no guarantee that the chosen organization’s employees, who have likely been subjected to selection procedures, training, onboarding, team building, and an eclectic mix of other interventions, represent the global worker pool.

All samples in I-O psychology have peculiar characteristics. This should not condemn these samples. Instead, a serious consideration of how these characteristics moderate or limit the study findings should be conducted.

JUICY: Regarding the controversy of convenience samples, we suspect that one driver of this debate has little to do with validity. Rather, there is a perception from practitioners that academics do not focus on questions that are important in the day-to-day functioning of organizations.

The use of student samples consisting primarily of older, nontraditional college students would likely compensate for any relative limitations in samples of traditional college students

**Convenience samples in general should be viewed with the most scrutiny when a research question deals with estimating precise effect magnitudes or with measuring the prevalence of a phenomenon, —– as opposed to the possibility of a phenomenon existing.

Yet 4 reviewer concerns are:

First, Repeated attempts: but we would not expect personality measures to become less accurate over repeated administrations.
Second, we have noted concerns over compensation and resulting motivation (e.g., “The participants were paid $2 [which is a very small amount] to participate . . . one has to wonder since the primary motivation of individuals who volunteer is to earn money”). This is of concern only if the compensation level or financially driven motivation can be theoretically linked to effect size.

Third, we have noted concern over selection bias (e.g., “What about participants who viewed the task but didn’t complete it? I find the lack of control over subjects troublesome”), but such issues are common in all convenience samples.

Snowball and network samples include e-mails distributed through social networks, alumni associations, civic groups, and referrals. Such samples may involve participants who are personal contacts of the researcher. This type of sample seems to be more common in qualitative research.

Sampling strategies can affect the conclusions one draws from convenient research samples in two major ways. First, samples may be range restricted on a variable or variables of interest; for example, a sample consisting of engineers may be range restricted in cognitive ability.

More broadly, nationality and culture can also be interpreted as range
restriction; when citizens of a single nation are selected for inclusion in a
study and others are excluded, any variable correlated with nationality will
be biased in its generalization to the global worker pool because of indirect
range restriction.

Perhaps a more insidious, and certainly a more difficult to address, problem is that of omitted variables. A model can never contain all possible variables of interest.

Direct range restriction: selection (into an org studied for example) is based on picking based on the predictor in the study (e.g., cog ability) but indirect RR is when people were selected for by another variable nonetheless correlated with the predictor.

Omitting variables introduces the possibility that the model is inaccurate; specifically, observed effects between predictors and criteria may be inflated or deflated because variance associated with the omitted variable is not considered. - this is AKA the omitted variables bias.

Henrich et al. (2010)

Study These Flashcards

WEIRD samples paper

May be solved by Landers and Behrend’s (2015) point about MTurk

Landers & Behrend (2015):

Direct and indirect range restriction

Study These Flashcards

Range restriction occurs whenever a sample variable’s range is reduced from its range in the population, which takes 2 forms:
First, direct range restriction describes situations in which a hard cutoff has been imposed directly on a variable of interest. For example, consider an organization where a minimum cutoff score on muscular strength has been set at, say, the ability to lift a 40-pound weight while standing.

Second, indirect range restriction describes situations in which a cutoff has been imposed on another variable (measured or unmeasured) that is correlated with a variable of interest. For example, most American organizations incorporate an interview as part of their employee selection process, and in many cases, interview scores correlate with cognitive ability.

According to Landers & Behrends, even if there is range restriction when is external validity still NOT threatened?

Study These Flashcards

Given that, the question that researchers must ask when choosing a sample is a specific one: Are the characteristics that are range restricted in the convenience samples we have access to likely to be correlated with the variables we are interested in measuring? If not, external validity will not be threatened for this reason.

The third variable problem (Spector, 2019; Landers & Behrend, 2015) is:

Study These Flashcards

wherein the omitted variable is a common cause of both the predictor and the outcome in the model ( and plausibly responsible for their apparent relationship).

In this case, the effect of the predictor will be overestimated to the extent that the predictor and omitted variable are related or will be underestimated in the event that the omitted variable is related to the predictor but not the outcome. The cause for concern is highest when the omitted variable relates strongly to the outcome and moderately with the predictors in the model.

Landers & Behrend (2015) When reviewers raise concerns about convenience samples…

Study These Flashcards

…they are often implicitly suggesting that an omitted variable is biasing the results. With regard to college students, the unmeasured variable may be work experience or consequences for poor performance on an experimental task. With regard to online samples, computer expertise may influence observed effects.

Landers & Behrend (2015) recommendations:

Study These Flashcards

Recommendation 1. Don’t Assume Organizational Sampling Is a Gold Standard Sampling Strategy.
Not only are employees within an organization range restricted on whatever selection devices were used to hire them, but there are also a host of omitted variables at higher levels of analysis that may influence the results of a particular study (e.g., organizational culture, industry, country)

Recommendation 2. Don’t Automatically Condemn College Students, Online Panels, or Crowdsourced Samples

Virtually all samples used in I-O psychology are convenience samples. Instead, we urge researchers to identify any range restriction likely introduced by the sampling strategy of that convenience sample, to determine whether any moderators of study effects have been omitted, to explore the potential impact of these issues given the research questions,

A particularly notable advantage of online panels and crowdsourced samples is that they are not necessarily WEIRD.

AND: In the field of I-O psychology, there are many phenomena that take place exclusively on the Internet. As such, the Internet is the best place to recruit and study individuals who engage in those phenomena.

Recommendation 3. Don’t Assume That “Difficult To Collect” Is Synonymous With “Good Data”

Recommendation 4. Do Consider the Specific Merits and Drawbacks of Each Convenience Sampling Approach (using Theory)

Recommendation 5. Do Incorporate Recommended Data Integrity Practices Regardless of Sampling Strategy

Meade and Craig (2012)

Careless responding is generally regarded as a source of measurement error in survey research When careless responses are included as part of a sample, they serve as threats to the validity of the inferences made on the basis of a study’s findings, including relationships between variables, the reliability of measures and their factor structure. Even when participants do not bother to read or interpret items, they will often still respond in non-random patterns This may include selecting the same response repeatedly, selecting responses in a particular order (e.g., 1, 2, 3, 1, 2, 3), or selecting items in a way that mimics a plausible response (e.g., varying responses, but sticking mostly to the positive side of the scale (Meade & Craig, 2012)

Huang et al. (2015) | How careless responding can weaken or even strengthen relationships

When random measurement error is introduced due to random careless responding, in the case that scale means for attentive participants are also near the midpoint of a scale, the presence of careless responding can have an attenuating effect on relationships between measured variables. -In the case that scale means for attentive participants are farther from the center of a response scale, careless responding can inflate relationships between variables

Curran (2016)

Direct methods such as attention checks In the form of directed responses (select somewhat disagree for this item). Curran (2016) recommends that participants be required to answer at least 50% of these items correctly in order to be included in the sample.

Meade & Craig (2012) | -Infrequency items

Some items may not clearly stand out as bizarre (e.g., being asked if “I am currently using a computer” on an online survey, while others are obviously ridiculous (e.g., “All my friends are aliens”

Meade & Craig (2012) Self-report

Self-reported indicators of data quality. Perhaps the most straightforward method for determining if a participant has paid careful attention during a survey is to directly ask them (Meade & Craig, 2012)

Curran (2016) one of most common methods of detecting careless respoding

Response time may be the most commonly used metric for eliminating careless responders, because it is typically automatically recorded, easy to understand, and easy to interpret (Curran, 2016) Huang et al. (2012) suggest a two-second per item benchmark to serve as a minimum

Huang et al. (2012) - seconds per item:

Mahalanobis Distance (Curran, 2016)

Mahalanobis distance captures the extent to which a participant’s responses represent an outlier in the data (Curran, 2016; Niessen et al., 2016).

Meade & Craig (2012) - Causes of careless responding:

Survey length Reverse-coded items: If participants are reading items carelessly (or not at all) and miss information indicating the negative nature of the item, they are likely to select response options on the opposite end of the scale than they intend. - recommended alerting respondents to the presence of reverse-coded items in the instructions. Social Norms: Meade and Craig (2012) suggested that participants’ perceptions of their social environment may influence the extent to which they answer survey questions carefully. Environmental distraction Any study that allows participants to complete questionnaires in uncontrolled environments (e.g., online survey, mail survey) may be affected by participants attempting to multitask while answering survey items (Meade & Craig, 2012).

Huang et al. (2015) - Warnings

Warning messages . Some studies have employed warnings that simply inform the participant that their responses will be statistically evaluated for low quality (Huang et al., 2015)

a bare-bones meta-analysis means...

Only correcting for the biasing effect of sampling error

Nixon et al. (2011)

Meta-analysis finding cross sectional correlations not any stronger than longitudinal (measuring specific variables such a work stress)

Hollenbeck and Wright (2017)

argued that THARKing is not only ethical, but also likely to be beneficial. And that harked results often have convoluted reasoning for why they were predicted.

DONE: Methods Flashcards

(37 cards)