Stats:Methods Flashcards
• Organizations are multilevel, but have not been conceptualized as such in research. • A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes explained• The ‘levels perspective’ is based on the interactionist view that behavior is a function of both person and situation• A meaningful understanding of organizational behavior necessitates approaches that are: more integrative, cut across multiple levels, and seek to understand phenomena from a combination of perspectives. A levels perspective offers a paradigm that is distinctly organizational.
Kozlowski & Klein (2000)levels/multi level issues
• W/in group agreement: rwg – calculated by comparing observed group variance to expected random variance (rwg(j) used for multiple-item scales; rwg designed for single items) → its magnitude doesn’t depend on between-group variance & provides measure of agreement for each group rather than an overall measure for all groups• Reliability: a measure assessing the relative consistency of responses among raters – the degree to which ratings are consistent when expressed as deviations from their meanso In multilevel org research, reliability commonly assessed by ICC1 (total variance that can be explained by group membership) and ICC2 (estimate of the reliability of group means)• Non-Independence: the degree to which responses from individuals in the same group are influenced by, depend on, or cluster by group (ICC1)• Eta-Squared: calculated from 1-way random-effects ANOVA, where group membership is IV and construct of interest is the DV – this stat is heavily influence by group size!
Bliese (200)
Three views of multi-level research and issues• Emergent view: Certain concepts apply only at higher aggregate levels and not at lower levels (e.g., institutional characteristics); only aggregate scores are relevant• Cross-level view: Attraction-selection-attrition (e.g., norms); both individual and aggregate scores are relevant• Individual view: Individual level is relevant; organizational level is irrelevant (e.g., personality)• Because ICC is based on ANOVA, and F-test can provide an indication of ICC• rwg should be interpreted along with an F-test• ANOVA should be used as a preliminary step in multilevel HLM analyses
Dansereau & Yammarino (2006)
• It has been long known that most, if not all, range restriction in real data is indirect and that direct range restriction is quite rare• It has been shown that when range restriction is indirect, these corrections lead to substantial undercorrection for the effects of range restriction and thus underestimates the true validity• On average, correction for direct range restriction resulted in substantial underestimation of operational validities for both job performance measures (21%) and training performance measures (28%)• Many relationships in the I/O literature may be stronger than they have been estimated to be in the past on the basis of use of the correction for direct range restriction instead of the indirect approach advocated by Hunter & Schmidt → Could view previous results as the lower bound for values versus the “average”
Schmidt et al (2006)range restriction
• Longitudinal data challenging - probably a degree of nonindependence of responses, meaning that a set of responses are likely to be correlated and the responses close temporally may be more strongly correlated than those far apart.• Regression approach has problems because it fails to account for the fact that responses are nonindependent. Also, it assumes that all individuals start and change at the same rate.• Alternative: Random coefficients model (RCM): compromise between 1) a regression model that ignores the fact that observations are nested within individuals, and 2) a series of regression models that estimate a separate model for each individual.• Describes steps in detail for RCM: level 1 (Estimating the Basic Model in R) and level 2 (Modeling Intercept and Slope Variation)
Bliese & Ployhart (2002)
• Power analyses reveal that samples typically need to be > than 30 to have adequate power• We should identify few independent variables and even fewer dependent variables (more will increase type 1 error and redundancy in criterion relevancy)• Frequency polygon, stem and leaf plot (graphical display) is > telling of data distribution• Fisherian legacy -> p=.05 is arbitrary• Null hyp. taken literally is always false in real world**• Effect sizes can have p values, but having CI is > informative.• The problems with using beta weights is that they are specific to the sample and we usually want to predict for population or future samples• Use unit weights (+1 for positive predictors, -1 for negative, 0 for useless) in multiple regression - Sample size doesn’t affect the unit weighted correlation• Rejection of a given null gives us no basis for estimating the probability that a replication of the research will again result in rejecting that null (H0 taken literally is always false in real world)• The product of research inquiry is a measure of effect size NOT p value (confidence interval is more informative than p)
Cohen (1990)
• SEM (a.k.a LISREL models): Seeks to represent hypotheses about the means, variances and covariances of observed data in terms of smaller number of ‘structural’ parameters defined by a hypothesized underlying model – it is NOT a causal model• Most prominent feature - deals with latent variables which are nonobservable quantities or factors underlying observed variables• allows analysis of dependencies of psychological constructs without measurement error• SEM compares the model to empirical data – leads to fit statistics assessing the match of the model to datao If fit is acceptable the measurement model and structural model are regarded as being supported• Sample size usually should be more than 25x the number of parameters to be estimated with a minimum subject-parameter ratio of 10:1 and should be at least 200 → SEM requires large samples
Nachtigall et al (2003)SEM
Approaches to mediation testing (3 broad approaches):• Causal steps/joint significance approach (e.g., Baron & Kenny, 1986)o establ. conditions for mediation but do not provide sufficient evidence for causual effects.o no estimate of effect size• Difference in coefficients approach (DCA) (e.g., Clogg et al/Freedman & Schatzkin)o Looks at difference between regression/correlation coefficients of relation between IV and DV before and after adjusting for the mediating variable→ provides estimate of the mediating variable effect size and its standard error• Product of coefficients approach (PCA) (e.g., MacKinnon et al. 2002)o Divides the estimate of the mediating variable effect size by its standard error and compares this value to a normal distribution to test sig. (Sobel formula)• Study did Monte Carlo simulation with all 14 mediation methods run with various sample sizes and effect sizes➢ MacKinnon et al. method (PCA) and Clogg et al/Freedman & Schatzkin (DCA) had most accurate Type I error rate and greatest statistical power – but assumptions not very conceptually strong
MacKinnon et al (2002)14 mediation methods
• Inadequate statistical power means that even if significant effect sizes exist in the population, they will be overlooked; progress of theories is slowed• Overpowered samples, using excessively large samples, increase oversensitivity to trivial findings• Insufficient power: researchers’ time and effort may be wasted by not finding actual effect sizes• Determinants of power: 1. Significance criterion, alpha (long-term possibility of erroneously rejecting the null hypothesis), 2. sample size, and 3. effect size (reflects the magnitude of a phenomenon in a population)• 210 articles from management journals: Small effect sizes (average power across tests .27); medium (.74), large (.92)• Journal editors and reviewers should make explicit the need to conduct and report power analysis (Many researchers are aware of power analysis but choose not to use it)
Mone et al (1996)power analysi
• The foundation for all longitudinal analysis is a statistical model that defines the parameters of change for the trajectory of a single participant → comparing people then becomes the task of comparing the parameters of the personal trajectories• Longitudinal research: Various statistical approaches can be used to do this including covariance component models, HLM, latent curve analysis, multilevel models, random-coefficient models, mixed models• Model 1: define trajectory for person “i” with a linear model and then describe the population distribution of personal trajectories• Model 2: Causal effect of an experimental treatment can be defined as: Effect = i(E) – i (C) (the effect is the difference between the score one individual would get under treatment versus in a control condition) – defined uniquely for each person• Model 3: add a pretest (and a pre-pre test for 3 data points – even better!) to compute gain scores (difference in gain scores between people should be less biased than difference in post-test scores only)
Raudenbush (2001)
• In meta-analyses we often find effects for relevant relationships that can’t be used bc they are are reported in different metric(s) -> current view is that beta coefficients shouldn’t be used in meta-analysis• A beta coefficient is a partial coefficient that reflects the influence of all predictor variables in a multiple regression model.• Omitting available effect sizes (such as beta coefficients) increases nonsampling error• Study compared corresponding beta coefficients (β) and correlation coefficients that were found in journals → the relationship between the two is quite robust (.70)• A formula is proposed to convert β to r values: correlation between the observed and converted rs was .65 (conversion formula is also robust) → More accurate and more precise estimates of population effect sizes than imputing zeros or observed means for missing effect sizes; ↓ sampling error
Peterson & Brown (2005)
• 2 approaches to missing data: Maximum Liklihood (ML) estimation based on all available data and Bayesian Multiple Imputation (MI)• Missing data types: nonresponse (refuses to participate); item nonresponse (skips questions); wave nonresponse (some waves of collection, but not others); attrition or dropout• Other methods of dealing w/missing data: delete case; available-case analysis (also called pairwise deletion or pairwise inclusion); reweigh remaining cases; averaging the available items; single imputation (replace missing item with a plausible value - uses regression to replace missing value with a random number from the predictive distribution of Y given X)• Max likelihood estimation: distribution of the observed data provides the correct likelihood for the unknown parameters• MI: Each missing value is replaced by a list of simulated values, gives many alternative versions of the complete data (reflects missing data uncertainty)
Shafer & Graham (2002)
• Misconceptions about null hypothesis significance testing (NHST): p is the probability of H0 is false; p’s complement is the probability of successful replication; if one rejects H0 one thereby affirms the theory that led to the test.• Author suggests: Exploratory data analysis, use of graphic methods, movement toward standardization in measurement, emphasis on estimating effect sizes using confidence intervals, relying on replication for purpose of generalization.• Rejecting null hypothesis does not imply that the theory is extablished -> as long as one has a large enough dataset, H0 can be proven false.• A correct interpretation of p values only tells us that A is large (or smaller) than B, but never ‘how much larger or smaller’ (confidence interval or some other effect size measure can be much more informative!)
Cohen (1994)(The Earth is Round¦p
• To present methods for making accurate corrections when range restriction is indirect, in terms of meta-analyses• Direct range restriction: If applicants are selected directly on test scores• Indirect range restriction: If applicants are selected on a different variable correlated with test scores• Underestimation of the predictive validity of selection procedures likely in meta-analyses• If range restriction is indirect, Hunter and Schmidt (1990) formula will under correct • First correct for measurement error in both variables using restricted sample reliability values, then correct for range restriction using the traditional formula, but using uT rather than uX as the index of range restriction• Can be used in individual studies as well
Hunter et al. (2006)range restriction
• Standardized item alpha: Index of internal consistency relatedto Cronbach’s alpha• Skewed distributions = source of non-normality • Effects of skew on reliability estimates: direct (skewattenuates correlation) and indirect (skew may produce factors that adversely affect measures of internal consistency)• 2 simulation studies (continuous items, Likert-type items)• Skew produced decreases in alpha and decreases in the averageinter-item correlation• Using Likert-type scaling - recommended to use more levels;negative effects of skew are reduced as # of levels for items increases• Transforming skewed variables may reduce the negativeeffects of skew
Greer et al. (2006)
• Small sample size is a primary cause of low statistical power in applied research.• Reanalysis of Peterson et al. (2003) (study of CEO personality, team dynamics, and firm performance) to assess the degree to which their conclusions would change by removing 1 subject from the sample or by using a more traditional probability level• 17 different tests with a sample size of 16 subjects were performed as part of a simulation; new set of inferences were substantially different from that originally published• Quantitative approach to original data set, given its small sample size, creates major difficulties when it comes to drawing rigorous inferences.
Hollenbeck et al. (2006)
• Comment on Hollenbeck et al.’s (2006) reply • Admit that they should’ve explicitly discussed small sample as concern for the stability of their results• But it was better to publish a study with low power and unstable parameters that was based on good theory and was consistent with prior empirical observations• To increase knowledge, it may be desirable to relax traditionally stringent statistical constraints (relating to power and alpha levels) to allow research to be conducted in under-explored research areas• Benefits of publishing a study in an under-researched area substantially outweighs risks of misinterpretation• Overall, Hollenbeck et al. did not sufficiently acknowledge the trade-offs associated with an emphasis on statistical power and controlling Type I error
Peterson et al. (2006)
• 3 common strategies to examine 3-way interactions: plotting,pick-a-point, and subgroup analysis (all have limitations• Development of a significance test for slope differences1) Calculate generic formulas for simple slopes of the relation between X and Y at high and low levels of Z and W 2) Calculate the difference between any 2 pairs of slopes. 3) Calculate the standard error of the difference of pairs of slopes. To determine whether slopes differ from each other, it is necessary to put the slope difference in relation to its standard error. 4) Test whether the ratio of the difference between pairs of slopes and its standard error differs from zero• Monte Carlo study: slope difference test is accurate and useful,allows researchers to test and explore slope differences that may remain undetected by use of alternative probing techniques
Dawson & Richter (2006)
• Measurement equivalence important as a prerequisite for meaningful group comparisons• 2 classes of methods for detecting differential item functioning (DIF): item response theory (IRT) and confirmatory factor analysis (CFA).• Simulation comparing CFA-based MACS (means and covariances structured method) and the IRT LR approaches with detecting DIF using a unidimensional 15-item scale• CFA and IRT similar in DIT detection efficacy when implemented using comparable procedures• Results contradict Meade and Lautenschlager’s (2004) article, which found that IRT LR outperformed MACS in small samples.• The most important finding is that the use of free (v. constrained-baseline models produces superior DIF classifications
Stark et al. (2006)
Extrememly complicated, all about measuring multi-levelCollective constructs: developed through interaction, can be global (high level only, shared (perceptions), or configural (ID of group)• Forms of composition: additive, direct consensus, referent shift, dispersion model, process modelUsed to justify aggregation:• rwg : assesses within group agreement in a particular group (does not include btwn grp variance in calculation)• ICC(1) : computes a ratio of between group variance to total variance in the measure (degree to which raters are substitutable)• ICC(2) : assesses reliability of group mean (is mean computed across individuals reliable)• within-and-between-analysis (WABA): assesses whether variance of a variable is primarily within groups, between groups or both
Hofmann (2002)multi-level
Guidelines for theory development, data collection, and data analysis in multi-level researchAssumptions: • Within groups homogeneity- focus on between groups variability• Individual independence- focus on between individual variability• Within groups heterogeneity- focus on within group variabilityMulti-level research results in:• Cross-level theory• Mixed effects- effects for different levels• Mixed determinants- different levels can all affect• Multi-level models- patterns replicated across levels
Klein et al. (1994)levels issues
Discusses measurement misspecificationLatent Construct Model with• Reflective indicators: reflective of one underlying construct, most commonly used type of measure• Formative Indicators: measures different facets of conceptual domain, dropping an item is a problem.• many contructs portrayed as reflective but actually formative (job perf., transformationl leadership)• Negatives of misspecification• Relationships between construct seriously underestimated• Type I and II errors increase by just under 20%• Inaccurate conclusions• with the exception of RMSEA, all goodness-of-fit indices (CFI, GFI, etc.) fail to detect model mis specification.• Measurement model misspecification can lead to biased parameters estimate, inflated Type II error rate.
MacKenzie et al. (2005)measurement model
Goal is to develop a system of personnel decisions that is:• scientifically and legally defensible • acceptable for use in organization• meets technical standards for qualityTest Construction Process1. Identification of SMEs 2. Establishment and weighting of content domain3. Initial item writing 4. Editing the items 5. Selection of a Validation Sample 6. Item analysis 7. Constructing the final form of the test 8. Setting Cut Score on the test 9. Part vs. Whole scoring of the test 10. Retesting
Muchinsky (2004)test development
Sources of common method bias• Source or rater• Item characteristic• Item content• Measurement contextProcedural and Statistical Remedies exist.• Predictor and criterion variables from different sources• Temporal, proximal, psychological, or methodological separation of measurement• Protecting respondent anonymity and reducing evaluation apprehension• Counterbalancing question order• Improving scale items• Many statistical methods (cntrlling for latent factors, using MTMM)
Podsakoff et al. (2003)common method bias