Stats:Methods Flashcards

1
Q

• Organizations are multilevel, but have not been conceptualized as such in research. • A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes explained• The ‘levels perspective’ is based on the interactionist view that behavior is a function of both person and situation• A meaningful understanding of organizational behavior necessitates approaches that are: more integrative, cut across multiple levels, and seek to understand phenomena from a combination of perspectives. A levels perspective offers a paradigm that is distinctly organizational.

A

Kozlowski & Klein (2000)levels/multi level issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

• W/in group agreement: rwg – calculated by comparing observed group variance to expected random variance (rwg(j) used for multiple-item scales; rwg designed for single items) → its magnitude doesn’t depend on between-group variance & provides measure of agreement for each group rather than an overall measure for all groups• Reliability: a measure assessing the relative consistency of responses among raters – the degree to which ratings are consistent when expressed as deviations from their meanso In multilevel org research, reliability commonly assessed by ICC1 (total variance that can be explained by group membership) and ICC2 (estimate of the reliability of group means)• Non-Independence: the degree to which responses from individuals in the same group are influenced by, depend on, or cluster by group (ICC1)• Eta-Squared: calculated from 1-way random-effects ANOVA, where group membership is IV and construct of interest is the DV – this stat is heavily influence by group size!

A

Bliese (200)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Three views of multi-level research and issues• Emergent view: Certain concepts apply only at higher aggregate levels and not at lower levels (e.g., institutional characteristics); only aggregate scores are relevant• Cross-level view: Attraction-selection-attrition (e.g., norms); both individual and aggregate scores are relevant• Individual view: Individual level is relevant; organizational level is irrelevant (e.g., personality)• Because ICC is based on ANOVA, and F-test can provide an indication of ICC• rwg should be interpreted along with an F-test• ANOVA should be used as a preliminary step in multilevel HLM analyses

A

Dansereau & Yammarino (2006)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

• It has been long known that most, if not all, range restriction in real data is indirect and that direct range restriction is quite rare• It has been shown that when range restriction is indirect, these corrections lead to substantial undercorrection for the effects of range restriction and thus underestimates the true validity• On average, correction for direct range restriction resulted in substantial underestimation of operational validities for both job performance measures (21%) and training performance measures (28%)• Many relationships in the I/O literature may be stronger than they have been estimated to be in the past on the basis of use of the correction for direct range restriction instead of the indirect approach advocated by Hunter & Schmidt → Could view previous results as the lower bound for values versus the “average”

A

Schmidt et al (2006)range restriction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

• Longitudinal data challenging - probably a degree of nonindependence of responses, meaning that a set of responses are likely to be correlated and the responses close temporally may be more strongly correlated than those far apart.• Regression approach has problems because it fails to account for the fact that responses are nonindependent. Also, it assumes that all individuals start and change at the same rate.• Alternative: Random coefficients model (RCM): compromise between 1) a regression model that ignores the fact that observations are nested within individuals, and 2) a series of regression models that estimate a separate model for each individual.• Describes steps in detail for RCM: level 1 (Estimating the Basic Model in R) and level 2 (Modeling Intercept and Slope Variation)

A

Bliese & Ployhart (2002)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

• Power analyses reveal that samples typically need to be > than 30 to have adequate power• We should identify few independent variables and even fewer dependent variables (more will increase type 1 error and redundancy in criterion relevancy)• Frequency polygon, stem and leaf plot (graphical display) is > telling of data distribution• Fisherian legacy -> p=.05 is arbitrary• Null hyp. taken literally is always false in real world**• Effect sizes can have p values, but having CI is > informative.• The problems with using beta weights is that they are specific to the sample and we usually want to predict for population or future samples• Use unit weights (+1 for positive predictors, -1 for negative, 0 for useless) in multiple regression - Sample size doesn’t affect the unit weighted correlation• Rejection of a given null gives us no basis for estimating the probability that a replication of the research will again result in rejecting that null (H0 taken literally is always false in real world)• The product of research inquiry is a measure of effect size NOT p value (confidence interval is more informative than p)

A

Cohen (1990)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

• SEM (a.k.a LISREL models): Seeks to represent hypotheses about the means, variances and covariances of observed data in terms of smaller number of ‘structural’ parameters defined by a hypothesized underlying model – it is NOT a causal model• Most prominent feature - deals with latent variables which are nonobservable quantities or factors underlying observed variables• allows analysis of dependencies of psychological constructs without measurement error• SEM compares the model to empirical data – leads to fit statistics assessing the match of the model to datao If fit is acceptable the measurement model and structural model are regarded as being supported• Sample size usually should be more than 25x the number of parameters to be estimated with a minimum subject-parameter ratio of 10:1 and should be at least 200 → SEM requires large samples

A

Nachtigall et al (2003)SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Approaches to mediation testing (3 broad approaches):• Causal steps/joint significance approach (e.g., Baron & Kenny, 1986)o establ. conditions for mediation but do not provide sufficient evidence for causual effects.o no estimate of effect size• Difference in coefficients approach (DCA) (e.g., Clogg et al/Freedman & Schatzkin)o Looks at difference between regression/correlation coefficients of relation between IV and DV before and after adjusting for the mediating variable→ provides estimate of the mediating variable effect size and its standard error• Product of coefficients approach (PCA) (e.g., MacKinnon et al. 2002)o Divides the estimate of the mediating variable effect size by its standard error and compares this value to a normal distribution to test sig. (Sobel formula)• Study did Monte Carlo simulation with all 14 mediation methods run with various sample sizes and effect sizes➢ MacKinnon et al. method (PCA) and Clogg et al/Freedman & Schatzkin (DCA) had most accurate Type I error rate and greatest statistical power – but assumptions not very conceptually strong

A

MacKinnon et al (2002)14 mediation methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

• Inadequate statistical power means that even if significant effect sizes exist in the population, they will be overlooked; progress of theories is slowed• Overpowered samples, using excessively large samples, increase oversensitivity to trivial findings• Insufficient power: researchers’ time and effort may be wasted by not finding actual effect sizes• Determinants of power: 1. Significance criterion, alpha (long-term possibility of erroneously rejecting the null hypothesis), 2. sample size, and 3. effect size (reflects the magnitude of a phenomenon in a population)• 210 articles from management journals: Small effect sizes (average power across tests .27); medium (.74), large (.92)• Journal editors and reviewers should make explicit the need to conduct and report power analysis (Many researchers are aware of power analysis but choose not to use it)

A

Mone et al (1996)power analysi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

• The foundation for all longitudinal analysis is a statistical model that defines the parameters of change for the trajectory of a single participant → comparing people then becomes the task of comparing the parameters of the personal trajectories• Longitudinal research: Various statistical approaches can be used to do this including covariance component models, HLM, latent curve analysis, multilevel models, random-coefficient models, mixed models• Model 1: define trajectory for person “i” with a linear model and then describe the population distribution of personal trajectories• Model 2: Causal effect of an experimental treatment can be defined as: Effect = i(E) – i (C) (the effect is the difference between the score one individual would get under treatment versus in a control condition) – defined uniquely for each person• Model 3: add a pretest (and a pre-pre test for 3 data points – even better!) to compute gain scores (difference in gain scores between people should be less biased than difference in post-test scores only)

A

Raudenbush (2001)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

• In meta-analyses we often find effects for relevant relationships that can’t be used bc they are are reported in different metric(s) -> current view is that beta coefficients shouldn’t be used in meta-analysis• A beta coefficient is a partial coefficient that reflects the influence of all predictor variables in a multiple regression model.• Omitting available effect sizes (such as beta coefficients) increases nonsampling error• Study compared corresponding beta coefficients (β) and correlation coefficients that were found in journals → the relationship between the two is quite robust (.70)• A formula is proposed to convert β to r values: correlation between the observed and converted rs was .65 (conversion formula is also robust) → More accurate and more precise estimates of population effect sizes than imputing zeros or observed means for missing effect sizes; ↓ sampling error

A

Peterson & Brown (2005)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

• 2 approaches to missing data: Maximum Liklihood (ML) estimation based on all available data and Bayesian Multiple Imputation (MI)• Missing data types: nonresponse (refuses to participate); item nonresponse (skips questions); wave nonresponse (some waves of collection, but not others); attrition or dropout• Other methods of dealing w/missing data: delete case; available-case analysis (also called pairwise deletion or pairwise inclusion); reweigh remaining cases; averaging the available items; single imputation (replace missing item with a plausible value - uses regression to replace missing value with a random number from the predictive distribution of Y given X)• Max likelihood estimation: distribution of the observed data provides the correct likelihood for the unknown parameters• MI: Each missing value is replaced by a list of simulated values, gives many alternative versions of the complete data (reflects missing data uncertainty)

A

Shafer & Graham (2002)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

• Misconceptions about null hypothesis significance testing (NHST): p is the probability of H0 is false; p’s complement is the probability of successful replication; if one rejects H0 one thereby affirms the theory that led to the test.• Author suggests: Exploratory data analysis, use of graphic methods, movement toward standardization in measurement, emphasis on estimating effect sizes using confidence intervals, relying on replication for purpose of generalization.• Rejecting null hypothesis does not imply that the theory is extablished -> as long as one has a large enough dataset, H0 can be proven false.• A correct interpretation of p values only tells us that A is large (or smaller) than B, but never ‘how much larger or smaller’ (confidence interval or some other effect size measure can be much more informative!)

A

Cohen (1994)(The Earth is Round¦p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

• To present methods for making accurate corrections when range restriction is indirect, in terms of meta-analyses• Direct range restriction: If applicants are selected directly on test scores• Indirect range restriction: If applicants are selected on a different variable correlated with test scores• Underestimation of the predictive validity of selection procedures likely in meta-analyses• If range restriction is indirect, Hunter and Schmidt (1990) formula will under correct • First correct for measurement error in both variables using restricted sample reliability values, then correct for range restriction using the traditional formula, but using uT rather than uX as the index of range restriction• Can be used in individual studies as well

A

Hunter et al. (2006)range restriction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

• Standardized item alpha: Index of internal consistency relatedto Cronbach’s alpha• Skewed distributions = source of non-normality • Effects of skew on reliability estimates: direct (skewattenuates correlation) and indirect (skew may produce factors that adversely affect measures of internal consistency)• 2 simulation studies (continuous items, Likert-type items)• Skew produced decreases in alpha and decreases in the averageinter-item correlation• Using Likert-type scaling - recommended to use more levels;negative effects of skew are reduced as # of levels for items increases• Transforming skewed variables may reduce the negativeeffects of skew

A

Greer et al. (2006)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

• Small sample size is a primary cause of low statistical power in applied research.• Reanalysis of Peterson et al. (2003) (study of CEO personality, team dynamics, and firm performance) to assess the degree to which their conclusions would change by removing 1 subject from the sample or by using a more traditional probability level• 17 different tests with a sample size of 16 subjects were performed as part of a simulation; new set of inferences were substantially different from that originally published• Quantitative approach to original data set, given its small sample size, creates major difficulties when it comes to drawing rigorous inferences.

A

Hollenbeck et al. (2006)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

• Comment on Hollenbeck et al.’s (2006) reply • Admit that they should’ve explicitly discussed small sample as concern for the stability of their results• But it was better to publish a study with low power and unstable parameters that was based on good theory and was consistent with prior empirical observations• To increase knowledge, it may be desirable to relax traditionally stringent statistical constraints (relating to power and alpha levels) to allow research to be conducted in under-explored research areas• Benefits of publishing a study in an under-researched area substantially outweighs risks of misinterpretation• Overall, Hollenbeck et al. did not sufficiently acknowledge the trade-offs associated with an emphasis on statistical power and controlling Type I error

A

Peterson et al. (2006)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

• 3 common strategies to examine 3-way interactions: plotting,pick-a-point, and subgroup analysis (all have limitations• Development of a significance test for slope differences1) Calculate generic formulas for simple slopes of the relation between X and Y at high and low levels of Z and W 2) Calculate the difference between any 2 pairs of slopes. 3) Calculate the standard error of the difference of pairs of slopes. To determine whether slopes differ from each other, it is necessary to put the slope difference in relation to its standard error. 4) Test whether the ratio of the difference between pairs of slopes and its standard error differs from zero• Monte Carlo study: slope difference test is accurate and useful,allows researchers to test and explore slope differences that may remain undetected by use of alternative probing techniques

A

Dawson & Richter (2006)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

• Measurement equivalence important as a prerequisite for meaningful group comparisons• 2 classes of methods for detecting differential item functioning (DIF): item response theory (IRT) and confirmatory factor analysis (CFA).• Simulation comparing CFA-based MACS (means and covariances structured method) and the IRT LR approaches with detecting DIF using a unidimensional 15-item scale• CFA and IRT similar in DIT detection efficacy when implemented using comparable procedures• Results contradict Meade and Lautenschlager’s (2004) article, which found that IRT LR outperformed MACS in small samples.• The most important finding is that the use of free (v. constrained-baseline models produces superior DIF classifications

A

Stark et al. (2006)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Extrememly complicated, all about measuring multi-levelCollective constructs: developed through interaction, can be global (high level only, shared (perceptions), or configural (ID of group)• Forms of composition: additive, direct consensus, referent shift, dispersion model, process modelUsed to justify aggregation:• rwg : assesses within group agreement in a particular group (does not include btwn grp variance in calculation)• ICC(1) : computes a ratio of between group variance to total variance in the measure (degree to which raters are substitutable)• ICC(2) : assesses reliability of group mean (is mean computed across individuals reliable)• within-and-between-analysis (WABA): assesses whether variance of a variable is primarily within groups, between groups or both

A

Hofmann (2002)multi-level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Guidelines for theory development, data collection, and data analysis in multi-level researchAssumptions: • Within groups homogeneity- focus on between groups variability• Individual independence- focus on between individual variability• Within groups heterogeneity- focus on within group variabilityMulti-level research results in:• Cross-level theory• Mixed effects- effects for different levels• Mixed determinants- different levels can all affect• Multi-level models- patterns replicated across levels

A

Klein et al. (1994)levels issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Discusses measurement misspecificationLatent Construct Model with• Reflective indicators: reflective of one underlying construct, most commonly used type of measure• Formative Indicators: measures different facets of conceptual domain, dropping an item is a problem.• many contructs portrayed as reflective but actually formative (job perf., transformationl leadership)• Negatives of misspecification• Relationships between construct seriously underestimated• Type I and II errors increase by just under 20%• Inaccurate conclusions• with the exception of RMSEA, all goodness-of-fit indices (CFI, GFI, etc.) fail to detect model mis specification.• Measurement model misspecification can lead to biased parameters estimate, inflated Type II error rate.

A

MacKenzie et al. (2005)measurement model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Goal is to develop a system of personnel decisions that is:• scientifically and legally defensible • acceptable for use in organization• meets technical standards for qualityTest Construction Process1. Identification of SMEs 2. Establishment and weighting of content domain3. Initial item writing 4. Editing the items 5. Selection of a Validation Sample 6. Item analysis 7. Constructing the final form of the test 8. Setting Cut Score on the test 9. Part vs. Whole scoring of the test 10. Retesting

A

Muchinsky (2004)test development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Sources of common method bias• Source or rater• Item characteristic• Item content• Measurement contextProcedural and Statistical Remedies exist.• Predictor and criterion variables from different sources• Temporal, proximal, psychological, or methodological separation of measurement• Protecting respondent anonymity and reducing evaluation apprehension• Counterbalancing question order• Improving scale items• Many statistical methods (cntrlling for latent factors, using MTMM)

A

Podsakoff et al. (2003)common method bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

2 Ethical issues in research• Poorly executed/design research may be unethical• Hyperclaiming: saying you will achieve something that your method can’t achieve• Causism: implying causal links where none are established• May also be unethical NOT to conduct a study• Unethical due to data: made up data, failing to report exclusion of outliers or subsets, bad data continues to be used in meta-analyses and for replication• Unethical reporting: intentional or unintentional misrepresentation (of findings or credit), questionable generalizability, self-censoring when conflicting findings result.

A

Rosenthal (1994)ethics

26
Q

3 Trends in management research Decreases in • Survey research• Lab research Increases in• Computer/experimental simulations• Field studies• Cross-sectional studies• Breadth of DVs and analytical approaches Failure to• Triangulate• Consider different types of validity• Use multiple sources in data collection• Fully utilize designs that emphasize internal, external and construct validity

A

Scandura & Williams (2000)trends in management research

27
Q

• Hunter & Schmidt (1990): Gave new meta-analysis (MA) methods, they emphasized the estimation of the variability of population correlations or effect sizes.• Credibility intervals = estimated distribution of pop. values VS.• Confidence intervals = based on sampling error, which is dependent on # of studies in MA.• Johnson, Mullen, and Salas (1995): Concluded that Hunter and Schmidt (1990) gave anomalous results. Johnson et al., said that in the Hunter & Schmidt (1990) method, increasing the number of studies in a MA does not decrease the SE of the mean correlation or effect size, and therefore, does not decrease the CI around the mean correlation and does not decrease the p value for the mean correlation.• The Johnson et al conclusion that Hunter & Schmidt (1990) method produces anomalous results stems from their use of an inappropriate formula for the standard error of the mean correlation.

A

Schmidt & Hunter (1999)comparison meta-analysis; credibility intervals

28
Q

Although method variance is a frequent criticism of research, the data here suggest the problem may be mythical.• Correlations of bias measures with measurements of constructs of interest tended to be small and rarely statistically significant.• Multitrait-multimethod analysis (Campbell & Fiske 1959) can help get the problem if it is there, emphasis is on measuring two different constructs with two different methods.

A

Spector (1987)method variance

29
Q

• Strong theory: explains, predicts, and delights!• Theory asks WHY certain connections, certain patterns, certain variables are expected• Some journals emphasize theory (qualitative), but most of our top journals emphasize data, so theory slips• We need to revise the norms about the linkage between theory and data, both are important and must be balanced.

A

Sutton & Staw (1995) Theory

30
Q

Role of judgment calls in meta-analysisImportant because• many researchers believe MA produces the answer in a lit review• MAs on the same topic are being published, and often they do not agree• 8 of 11 steps in a MA involved judgmentConclusion and Recommendations• Researchers’ have influences on final outcomes of MA• Divergent findings often result of types of judgment calls• MA authors must pay attention to judgments they make, and should report their decisions in detail and test the effects of their judgments when possible• MA should be conducted by more than 1 person• Authors should first consider doing a narrative review• Deciding to do a MA implicitly excludes qualitative research, which is itself a judgment made by the researcher

A

Wanous et al. (1989)judgement calls in MA

31
Q

• A critical specification lacking from much org theory/research is time scale• Time scale: the size of a temporal interval, whether subjective or objective, used to build or test theory• The nature of relationships may vary across time scales• Types of time scale intervals• Existence interval- time needed for 1 instance, process, pattern, phenomenon to occur/unfold• Validity interval- time scale where the theory holds• Observation interval- period of observation• Recording interval- frequency of measurement• Aggregation interval- time over which info. is aggregated• Time scales issue treated much like levels of analysis issue

A

Zaheer et al (1999)

32
Q

def: strict adherence to moral beh or involve > benefits than harm -> greater + than - conseq. = greater good for greater # of peopleEthical considerations• issues of coercion, informed consent, confidentiality, deception, debriefing, misrep of data, censoring, plagiarisim• preventing and resolving - guidelines, IRB, peer review, informal resolution• Special Issues- internet research, global research, research on ethics itself. ethical codes can vary across these.

A

Aguinis & Henle (2002)ethics

33
Q

• Longitudinal modeling• Issues in longitudinal modelingo Change over time meaningful? Reversible?o Proceeding in single or multiple paths? Gradual?o Due to changes in instrumentation?o Individual or group level? Predictable differences?o Cross-domain changes? Do specific facets change?• Traditional approaches o difference scores, repeated measures, time series models• Latent variable approaches better because: o Requires multiple indicators, tests for qualitative change, applicable to multi-group patterns, looks at mean-level info., detects invariance, and is a direct and comprehensive assessment, intra & inter indiv. change.

A

Chan (2002)longitudinal modeling

34
Q

• Why “usefulness” or degree to which a theory contains actionable solutions to “real world” problems is not a good criterion for good theory• Because science if fallible (the truth of any given theory is always unknown), taking prescriptive statements from a theory and applying it to an org is problematic• Using usefulness as a criteria may obstruct theory development because it leads researchers to build theories that only serve the needs of managers• Argue that the purpose of theory in practical setting (suggests courses of action) should be as an educational device rather than a set of prescriptions formulated to solve problems• Scientific theory-builder different from action researcher

A

Brief & Dukerich (1991)

35
Q

• Transformation of effect sizes (d) into pt-biserial correlations (r) can be biased substantially when grps. Have unequal sample sizes • Hunter and Schmidt’s (1990) formula of estimating sampling error of d assumes equal sample sizes• Hunter & Schmidt (1990) offered a formula for estimating the sampling error of d. The formula assumes equal split of group sizes, but the assumption was not explicitly statedo Underestimation of sampling error of d.• The use of equal sample size formula will always lead to at least some underestimation of sample error of d when group sample sizes are not equal and when analyses are conducted with point-biserial correlations, as the point-biserial correlation is a transformation of d – uses a formula that incorporates unequal subgroup sample sizes (e.g. Hedges & Olkin, 1985)• When correcting a point-biserial correlation for unequal sample sizes, correction would also increase the sampling error of the corrected r – try to eliminate bias on sampling error• When sample sizes are equal across the two groups, meta-analysis of d and meta-analysis of point-biserial r (converted from d) are equally viable• Solution = formula that incorporates unequal grp. sizes (e.g., Hedge and Olkin’s)

A

Laczo et al (2005)meta-analysis/sampling error

36
Q

• 5 ways in which theory informs method with respect to time: time lag between X and Y; duration of X and Y; rate of change; dynamic relationships when X and Y both change; reciprocal causation: X causes Y, and Y causes X• Method issues to consider in time studies: research design, timing and frequency of measurement, stability of variables• Useful analytical tools for time/causation studies: HLM, Latent Growth Modeling (LGM), and Pooled cross-sectional time series (PCSTS)• Moderation of causal cycle curve (MCC curve)• In a causal cycle with X causing change in Y, Y can go though the causal cycle• Best time to measure Y is during equilibrium condition, when Y is relatively stable but has not yet started changing due to other factors• Measuring Y during equilibrium and entropic periods underestimate the magnitude of causal relation between X & Y

A

Mitchell & James (2001)theory

37
Q

• Scale development stages: 1) item generation 2) questionnaire administration 3) initial item reduction 4) CFA 5) convergent/discriminant validity 6) replication• 1) items generated deductively or inductively, short, content validation• 2) representative of population, several independent samples• 3) CFA, eigenvalues > 1 good, reliability of measure (.7), eliminate bad items• 4) LISREL, goodness of fit (smaller chi-square = better), delete more items• 5) examined using MTMM, criterion validity (relationships w/ other variables expected to correlate w/ it)• 6) Inappropriate to use same sample both for scale development and for assessing the psychometric properties of the new measure

A

Hinkin (1998)

38
Q

• Differential prediction: the notion that tests do not predict perf equally well across groups• Measurement bias: at the item level, bias refers to differences in the probability of correctly answering an item among individuals having the same level of ability but belong to different groups; at the test level bias refers to differences in the expected total scores for the same individuals• IRT-based differential functioning methods resolve the confounding of bias with observe mean differences, but these results depend on stat sig tests are influenced by sample size• Solution 1 to measurement bias: relate DTF (differential test functioning) results to raw scores• Solution 2: relating DTF results to 4/5th rule – assess the magnitude of bias by comparing the proportions of respondents selected from reference and focal groups and using various cut points on the observed score metric

A

Stark et al (2004)

39
Q

• Cause of most underpowered stuides = testing mult. Hs• Most studies test multiple hypotheses – adequate number of tests will be statistically sig even if power of any single test is inadequate• Types of power: specific comparison power (probability that a specific comparison will be sig); any-pairs power (probability that at least one comparison pair will be sig); all-pairs power (probability that all pairs that are different from each other will be sig) – differences in the types of power depend on effect size, sample size, and equality of cells• Underpowered studies lead to: difficulty interpreting the results of any single study and misinterpretation of nonsig tests (try doing meta-analyses, report all results, recentering method)• Replication problems: smaller effect sizes from replication may lead to throwing out the results due to nonsig.• Effect size/confidence interval reporting, re-centering, meta-analyses, reporting all findings, changing pub. practices to report all findings may motivate more powerful studies by showing major points of confusion in result interpretations

A

Maxwell (2004) power

40
Q

• Meta-analysis increased statistical power• Stat power = likelihood of detecting, w/in a sample, an effect/relationship that exists w/in the pop• Random effects (RE) model permits generalizations beyond the review, whereas fixed effects (FE) models analyses only permit inferences about estimated parameters• FE: “what is the best estimate of the pop effect size and is it of practical or theoretical sig?”• RE: “what is the range and distribution of pop effect sizes,a nd what proportion of these values is small or large, neg or pos?”• FE assumes that all samples arise from 1 pop with the same effect size parameter• RE assumes the effect sizes are heterogeneous and sampled from a distribution of pop effect sizes

A

Cohn & Becker (2003)

41
Q

• History of research methods in I/O• Measurement = conceptualizing and scoring the attributes of entities (instrument development, reliability, validity (criterion/content), test bias, model complexity, DOT, test fairness)• Design = planning and executing research to support valid inferences that generalize (experimental vs. correlational psychology, importance of longitudinal research, quasi-experiments, focus on time, field research more generalizeable)• Analysis: making sense of data from the measurement and design (multiple regression, partial correlation, ANOVA, individual level of analysis, multi-level analysis, sig test debate, effect sizes, power analysis)

A

Austin et al (2002)history/methods

42
Q

• HLM investigates how variables at one level of analysis are influenced/influence variables at another level.• 3 possible treatments of hierarchical data: 1. Lower level unit assigned higher level unit’s score 2. Aggregate lower level units to group level 3. Recognizing interdependence of ind. w/i same group.• Options for hierarchical data: disaggregate the data such that each lower level unit is assigned a score representing the higher level; aggregate individual outcomes to group level; HLM• HLM: One simultaneously estimates two models: one modeling the relationships within each of the lower level units, and a second modeling how these relationships within units vary between units• Intercepts and slopes @ level 1 used as outcome measure at level 2 -> investigate if differences in slope/intercept a function of grp. membership• Virtually all longitudinal data/studies are hierarchical• Fewer grps., > indiv. per grp. VS. more grps.,

A

Hoffman (1997)HLM

43
Q

• Examined 261 JAP, Ppsych, & AMJ, all of which had to have at least one MMR (moderated mult. Regression) study with at least one continuous predictor, one continuous criterion, and one categorical moderator - computed effect sizes• Mean effect size = .009• Mean power to detect a small effect= .84, medium effect = .98, large effect = 1.0.• Although observed effect sizes are smaller than expected, statistical power is sufficient to detect what is conventionally defined as small targeted effect size• likely that design, measurement, and statistical artifacts decrease observed effects sizes substantially• IMPLICATIONS: better methodologies; awareness of factors that affect the power of MMR and implement recently developed computer program to calculate power of MMR; small/non-detectable effect sizes do not necessarily mean that the effect of moderators (age, gener, etc.) does not exist.

A

Aguinis et al. (2005)effect size, power, categorical moderators

44
Q

• Internal validity threats: any type of 3rd variable; most problematic in quasi-experimental studies (randomization takes care of some threats to this• Statistical conclusion validity threats: preclude valid conclusions about existence of treatment effects (power, type II error, reliability of measures• External validity threats: focus on generalizing to or across times, setting and persons• Construct validity threats: possiblity that operational definition of cause or effect can be construed in more than one construct

A

Cook & Campbell (1976)

45
Q

• Moderator - f(x) of 3rd var, partitions IV into subgroups that est. its domain of maximal effectiveness in regard to given DV. Functions as IV.• Mediator - f(x) of 3rd var, represents generative mechanism through which IV influences DV. Can be effect or cause.• Moderator vars specify WHEN certain effects will hold, mediators speak to HOW or WHY such effects occur.• Testing mediation - regress mediator on IV, regress DV on IV, regress DV on both IV and mediator.• Estab. mediation: 1) IV must affect mediator 2) IV must affect DV 3) mediator must affect DV. If these hold, then effect of IV on DV is

A

Baron & Kenny (1986)testing mediation/moderation

46
Q

• decline of popularity of experiments• situation experiment proposed – laboratory type experiment conducted in natural settings (eg in an org)• Different from lab/field studies: awareness of experiment, opp. For random assignment, quality of manipulations, controls, artificiality of research setting• Ethical considerations: deception, debriefing

A

Greenberg & Tomlinson (2004)

47
Q

• May be inappropriate to make judgement about relationships based on a sig. test• Effect size = sample based estimate of the size of the relationship bw vars (A. Measures of standard difference bw group means, i.e., Cohen’s d B. Measures of explained variance, i.e., r-squared and eta-squared)• There are different ES that should be used based on combination of dichotomoous and/or cont. vars

A

Breaugh (2003)effect size, sig testing

48
Q

• Validity = an integrated evaluative judgement of degree to which empirical evidence and theoretical rationale support the adequacy and appropriatness of inferences and actions based on test scores and other modes of assessment• Validity Generalization = systematic examination of replications by collecting validity coeff. from indep. studies, which form a distribution (mean coeff. in this distribution is the estimate of the correlation in the population)• Reliability = info about measurement error (coeff. of stability, equivalence, internal consistency)• SE of measuremnt = SD of hypothetical distribtion• G-theory - used ANOVA to test limits of conditions within which interpretations of scores generalize

A

Guion (2002)reliability and validity

49
Q

• Problem with statistical significance testing = Type I error occurs when null H is incorrectly rejected. Type II error ocurrs when null H is incorrectly accepted. Statisical sign. if defined in a way that Type I error is at most 5%, but Type II error can go up to 95%. The problem is that we don’t know which error rate applies to a study unless we know in advance whether or not null H is false - which is the whole point of doing the study!!!• Abandon sign. testing and use CIs• Meta-analysis: uses CIs, helps corrrect for biases due to study artifacts, meets needs of social science reserach to make sense of past studies by looking across studies, estimates empirical relationships indep. of study artifacts• list of artifacts IN ARTICLE SUMMARY

A

Schmidt & Hunter (2004)significance testing/meta-analysis

50
Q

• examined measurement equivalence of mutisource ratings by assessing direct effects of both rating dimension and rating source on ratings in context of muti-trait/multi-rater (MTMR) approach (Air Force sample)• MTMR construct validity = variances obs. in rating can be attributed to: direct effect of perf. dimension, rating source, unique variance due to eithe unique conceptualization of dimensions by indiv. raters or random, unsystematic error• Evaluated 3 MTMR based models CFA• Different sources are = with respect to perf. constructs being rated, trait effect > method effects, impact of different sources on ratings differ a lot across ratings.• Incorporate more raters to reduce unreliability

A

Woehr et al (2005)multi-source rating (MTMR)

51
Q

• Variance of a variable comes from 3 sources:1. trait variance - attributable to construct2. method variance - systematic influences from method3. error - random, unsyst. Error• self report = should be cautious about inferences drawn, but so is the case for other methods• correlation among self-reports:1. longitudinal & quasi-exp better than cross-sectional2. best design = exp. with IV manipulated by DV measured3. use multiple sources of data to control some method variance4. cross-sectional = cannot infer causual connection

A

Spector (1994) Self report

52
Q

• MA are useful but not a panacea• Inferences from MA depends on representativeness of samplesIn study, validity of each primary study, # of primary studies• point estimates don’t tell us why and how rel. exists• search for moderators suffer from low power & often conducted w/o theoretical reasoning.• concerns about internal, external, construct validity• some use of aggregation may just be moving problem to another level, not solving them• suggest to conduct larger sample studies with measures & manipulations that have ↑ levels of construct validity.

A

Bobko & Stone-Romero (1998)meta analyses

53
Q

• appropriatness for AMJ• clarity of exposition• technial adequacy• theoretical contribution• empirical contribution•interestingness, innovativeness, novelty• potential implications for practice• potential sign. of contribution• magnitude of contribution rel. to length (parsimony?)

A

Colquitt & Ireland (2009)what do reviews look for in journal article?

54
Q

• 4 Ways to combat intentional distortion:o Forced choice; subtle items (shown not valid as obvious items); instructions to not fake; distortion detection scales.• Faking study shows that people can distort their responses when asked to (fake good and fake bad).• Intentional distortions in the form of social desirability did not affect temperamental constructs → perf. Rel.

A

Hough et al (1990)

55
Q

• Why is faking likely in personality instruments among job applicants?o Job applicants motivated to appear positive.o Transparent items make it easy to “guess” correct answers.o Can’t verify accuracy of responses.• Response distortion (RD) affects hiring decisions – even if doesn’t affect predictive validity – top scoring applicants might be those engaging in increased RDo Because of skewness of RD (toward increased end) and small selection ratios.o Should adjust personality scores so don’t choose increased RD scorer only. Eg. Sel. Ratio of 5% → 7/8 people with extreme RD scores.• Conclusion: RD can affect construct validity of personality scores.

A

Rosse et al (1998)

56
Q

• Debate over social desirability as source of contamination vs. reflection of real individual differences.• Social desirability (SD) → emotional stability and SD → conscientiousness similar when using self-reports or other ratings of personality. o SD might be substantive component of emotional stability and conscientiousness.• SD → school success (negative)• SD→ training performance (positive)• SD does not lead to performance, not function as mediator or predictor of job performance.• SD does not function as a suppressor variable.• Conclusion: SD does not attenuate personality → performance criterion related validity → controlling SD in such rel. will partial out substantive variance (given SD → ES and conscientiousness.

A

Ones et al (1996)

57
Q

content analysis and review of research in JAP and Ppsych - concerns w the effective implementation of the Science-Practitioner model because there is a serious disconnect bw the knowledge that academics are producing and the knowledge that practitioners are consuming - gap still exists

A

Cascio & Aguinis (2008)

58
Q

• Utility analysis is important:• Measures will lead to more rational and productive choices about people• Measures will convince others to support and invest in HR management programs

A

Boudreau (1996)

59
Q

• Found that utility analysis actually reduced managers’ support for a hypothetical selection program

A

Latham & Whyte (1994)

60
Q

o Said the findings from Latham & Whyte may have come from perceptions of coercion from the psychologist- need to be careful of how it is communicated

A

Cronshaw (1997)

61
Q

• SD of employee performance in dollars -> sign. technical debate and psychometric measurement attention further research on SDy and the logic used by its estimators may never make utility analysis estimates extremely precise/lacking objective criterion to evaluate SDy estimates SDy may not be such a critical parameter, and questioned the continued investment in SDy estimation researcho “suggest utility researchers should focus on understanding exactly what Y represents”may be no closer to understanding whether SDy captures variability in employee value, and journal editors may have tired of such attemptso Value estimates and actual output rarely correlated > .70

A

Arvey & Murphy (1998)

62
Q

• Ethics code of psychologists and code of conduct• Psychologists are committed to increasing scientific and professional knowledge of behavior and people’s understanding of themselves and others and the use of such knowledge to improve the condition of individuals, organizations, and society• Principles: Beneficence and Nonmaleficence (strive to benefit those with whom they work and to do no harm), Fidelity and Responsibility, Integrity, Justice, Respect for people’s rights and dignity• List of ethical standards…

A

American Psychologist (2002)