Midterm Flashcards
Why has psychology been criticized for not being a legitimate scientific enterprise? Mention Meehl’s criticisms,
Meehl’s Criticisms
- With a large enough sample size anything can be rejected, there will always be a difference between 2 means
- findings are context dependent
- null is technically always wrong
- hard to refute theories
- non-cumulative
- Theories are never refuted nor corroborated, conflicting and just creates more “rubble”
- Prone to bandwagons and sinking ships
Why has psychology been criticized for not being a legitimate scientific enterprise? Mention measurement precision, and hypothesis
Measurement Precision and hypothesis testing
- In psychology more precision leads to a better chance at finding a significant difference, whereas in the physical sciences more precision makes it harder to find a difference.
- As our procedures get better, more powerful, more likely to find null to be false
- More precise = more likely to reject the null
- Physics makes point predictions whereas psychology tests for zero’s.
- In psychology null statistical hypotheses are not derived from substantive theories thus its rejection would not increase the plausibility of the substantive theory.
Why is cumulative progress in psychology slow?
- We rarely refute theories
- Due to money and resources, large sample sizes are hard to get. This in combination with significance testing produces a large amount of conflicting results.
- Humans are harder to study
- In psychology there may be many multiway interactions needed to explain/predict a certain behavior
- Theories themselves are rarely put to the test, auxiliary theories are.
What is the problem regarding core and auxiliary theories in psychology vs. other sciences?
- There is a large gap between core and auxiliary theories in psychology vs the hard sciences. Auxiliary theories are loosely derived from substantive theories.
* Independent testing of auxiliary theories is harder to do in psychology because so many variables are at play.
Describe Lakatos’ view of scientific progress and the historical perspective
Laktos
-Scientists tenaciously cling to their theories
-Theories are never abandoned after refuting evidence due to a protective belt of auxiliary theories
-Theories that propose novel content but are not corroborated with core theories are called ad hoc theories
Historical-
Theories must explain previous findings and build upon those findings in order for progress to be made
Describe positive and negative heuristics
- Negative
* Defensive reaction to refuting evidence by attributing the problems to less important features, blaming auxiliary theories not the core.
* Ex. Blame participants, too small of sample size, sensitivity of measure (usually blame auxiliary factors) - Positive
* How we can modify the core to account for our findings while still preserving the core, adjusting or expanding core.
* Ex. Theory true for men not women, or parents are not the only reason children become psychopaths (need to look at more factors
Describe Progressive and degenerating research
-Progressive
*Theories that expand
*A theory remains progressive as long as its positive heuristic is still capable of anticipating novel facts
-Degenerative
*Theories that are no longer moving forward and have stalled
*However it is considered okay to cling to a degenerating research program if no rival program exists that satisfies all the findings.
When a competing theory supports and explains all the other stuff
Sign of a degenerating research program (one that isn’t progressing) → not able to account for new emerging evidence, use negative heuristics (always blame auxiliaries, always on the defense), not expanding
Describe Crucial tests
- Crucial tests pit one theory against another to see which is correct
* Ex. Einstein’s theory of relativity and the solar eclipse - The need for a crucial test is usually seen in hindsight; history sometimes dictates if a test was a crucial one of not
How is Lakatos’ view relevant to the criticisms of psychological science vs. other sciences?
- hard sciences cling to their theories just as much as psychology
- The notion that physical sciences are consistent or cumulative is not supported and the claim that psychological sciences are inconsistent or non-cumulative is not supported either.
- Critics hold an idealized version of research practices in physics as a standard by which psychology should follow however research in physics as shown similar deficits in methodology proportional to those found in psychological research.
What is construct validity and how is it related to other forms of validity?
a broad concept encompassing all forms of validity, essentially depicting the accuracy of the measure. Is the measure measuring what it is supposed to be measuring?
-As our understanding of the construct becomes clearer, measures shift to make them more accurate.
-It is a process rather than an endpoint
- It should be consulted at all points of developing a measure, not just at the end
- It is best understood as an over arching concept of all types of validity
three aspects of construct validity; substantive validity- ( lit reviews) structural validity, and external validity
Why is construct validity not a static quality of a test that can be established with a single study?
- Construct validity is dynamic.
- When scales are constructed some aspect of the theory will be valid but not all will support the theory
- researcher must decide whether the fault lies with the test or the theory
- one cannot just throw away years of work because a single study didn’t support the theory
- construct validity is acquired by rigorous testing of alternative hypotheses to further our understanding of the construct.
When is construct validity typically assessed and when should it be?
- Construct Validity is typically considered after the test has been constructed, a post-hoc fashion. However, it is more appropriately considered a process than an endpoint.
- Construct validity should be considered at all stages of the scale construction process
Why is a lit review an important first step in scale construction?
- reveals whether psychologists already have a reliable scale for measuring a specific construct, thus if they do, development of a new scale may not be necessary.
- However, a new scale may be important if previous scales define the construct differently or are measuring ranges that are too narrow or too broad
- The literature may also reveal if new scales are needed to advance a theory or cross-validation with other measures of the same construct
- It can develop a clear conceptualization of the target constructs.
- although one may already have a general sense of the concept, literature reviews may be helpful in considering alternative explanations.
Why is it useful to write a formal definition of a construct in the very early stages of test development?
- A formal definition helps finalize the construct and define its breadth and scope.
- The researcher can define lower order components of the construct such as subcomponents, allowing the conceptualization of the construct to expand to include all overlapping subcomponents.
Why should initial item pools be over-inclusive?
- so every possible aspect of that construct is covered and the boundaries of that construct defined
- As analyses are run, weaker and unrelated items can then be dropped from the final scale
- You can always take items away but you cant add new ones throughout.
What is content validity?
the degree to which a measure is representative to all the possible facets of a target construct.
- Related to relevancy, the appropriateness of a measure’s items when measuring the target construct. All items in the measure should fall within the target construct.
- Also relate to representativeness, the degree to which the item pool adequately samples content from all aspects of the target construct. Often used in the form of subscales.
Why should items with extreme endorsement probabilities not be automatically dumped?
- Often removed because researchers believe they don’t offer much information
- may be beneficial to know who are the five percent who responded oppositely, ex. To a question on suicidal thoughts
- many measures are tested across a wide variety of individuals (college student to psychiatric patients) and these groups may differ on their average trait levels, thus excluding that item may dispose of important information regarding a certain group of people
- It is important to look at people on extreme ends too
What are the pros and cons of dichotomous response formats?
- Pros
- Easier scoring and analyses (computers have made this obsolete)
- Less time consuming - Cons
- Less reliable (Must make the scales longer in order to achieve same reliability as polytomous scales)
Describe the rational-theoretical method of item selection
a method in which the scale developer simply writes items that appear to be consistent with the target construct.
- Pros - Simple - Good convergent validity - Cons - Poor discriminant validity
Describe the Criterion-keying method of item selection
items on selected for a scale based solely on their ability to discriminate between individuals in a “normal” group and those from the group containing the construct being measured
- Item content is irrelevant
- Pros
- Good discriminant validity
- Good convergent validity
- Empirical - Cons
- Measures are atheorectical and don’t advance psychological theory in a meaningful way
- Scales are highly heterogeneous, lack internal consistency, making proper interpretation of scores difficult
What is factor analysis and why is it a useful tool in scale construction?
A statistical procedure used to identify clusters or groups of related items on a test. It is extremely useful for producing homogenous scales with good discriminant validity.
Describe the features of good and bad candidate items
- Good Candidate Items
- items that load moderately on a primary factor and only minimally on other factors - Bad Candidate Items
- items that load weakly with the hypothesized factor or cross-loads on other
Describe the kind of factor analysis results that are required for measures of broad constructs that have subscales.
Groupings must be homogenous and correlated for a broad construct to be broken into subscales.
Differences between internal consistency and homogeneity?
- Internal Consistency
- measured by Cronbach’s alpha
- indicates the overall degree of interrelation between a set of items
- examines the average inter-item correlation and the number of items on the scale
- Homogeneity
- uses internal consistency to establish homogeneity
- indicates the extent to which all of the items on a given scale tap a single facet of a target construct.
- examines the mean of the distribution of the inter-item correlations.
How will the inter item correlation matrix differ for uni- and multi-dimensional pools of items?
-there may be significant variability in inter-item correlations in a multi-dimensional pool of items whereas a uni-dimensional pool of items would cluster around the average.
What is examined in the external validity phase and how is the task different from the task in the structural validity phase?
- examines the relationship between the new measure and important test and non-test criteria are congruent with one’s theoretical understanding of the target construct and its position with respect to other similar and dissimilar constructs called the nomological net.
- structural validity phase involves analyses of items WITHIN the new measure.
Convergent validity
the extent to which a measure correlates with other measures of the same construct
Discriminant Validity
the extent that a measure does not correlate with measures of other constructs.
Predictive validity
The extent to which a measure can predict a criterion occurring in the future.
Concurrent validity
Relating a measure to criterion evidence collected at the same time as the measure itself.
What is measurement and what problem occurs with almost all measures?
- Measurement = the process of building models that represent phenomena of interest, typically in quantitative form
- Error will always occur
- This error occurs not just in psychology – Amniotic fluid ex two doctors could decide different things
- Almost all measures struggle with validity and reliability as no measure will be perfectly reliable and perfectly valid
Why should we expect measurement models to be eventually proven wrong?
- All measurement models will eventually be proven wrong because other models will be developed that supersede those models
- measurement models must be specified explicitly so that they can be evaluated, disconfirmed, and improved
- Comparative model testing is one of the best ways to determine which model is the “least wrong.”
Describe the two components of observed scores in classical test theory and the definition of reliability
1) Observed Score = True Score + Error
- The true score is the score each person would obtain if there were no error. It also represents the population parameter.
- Error is all the variations in the circumstances of measurement that are not related to the measurement itself.
2) Reliability = the consistency of a measurement procedure and the extent to which scores produced by the measure are replicable. The ratio of the true score variance to the observed score variance. A reliability of 0.7 or higher is sufficient but depends on the circumstances of the study ex. A study of suicide.
Describe the cost of low reliability
- Trouble with operationalizing constructs if the “meter stick” isn’t consistently measuring a construct
- The true correlation between a measure and the constructs measured may be underestimated (Attenuated)
- Small sample sizes combined with low reliability make detecting an effect difficult
- When true correlations are small and are combined with low reliability it makes it difficult to detect a difference.
- Effect sizes will be severely underestimated
What is correction for attenuation and why should it be performed?
- Correction for attenuation is a statistical procedure in which reliability indices are used to correct for underestimated observed correlations due to unreliability.
- It should be performed such that researchers do not underestimate the strength of the relationship between two variables, thus resulting in lower effect sizes and possibly insignificant findings when an effect does exist.
What is the major benefit of generalizability theory over classical test theory statistics?
- permits a decision maker to pinpoint the source of measurement error and quantify them
- The idea is to obtain a certain level of generalizability, which is the extent to which a score is interchangeable with other scores, ex. Similar numbers are interpreted the same way, amniotic fluid
- G-Theory disentangles multiple source of error rather than a broad error term the CTT provides
- G-theory allows researchers to see exactly where the sources of error are coming from and possibly reduce these problems
- It taps into every facet of that may influence reliability
Three sources of reliability
1) Internal Consistency
- Every item is correlated with every other item, ex. Split-half reliability
2) Interrater
- How much scores vary across different raters or judges
3) Test-Retest
- How responses vary across time, ex. IQ test results remain relatively stable across multiple sessions
What is Cronbach’s alpha? Explain why alpha is not a pure index of internal consistency and why high alpha values do not necessarily indicate homogeneity.
- average of all possible split half reliabilities (split half= half pool items into two groups and compute the correlations between the two split groups-should be random)
- it is not a pure index of internal consistency because alpha levels are not a good reflection of the data. Internal consistency is problematic and it is best tool look at homogeneity.
Latent trait
A trait that underlies and directly influences individual’s behaviors and responses.