Reliability/Validity Terms Flashcards
Reliability definition
The consistency of a measure
3 types of reliability
internal consistency, stability, equivalence
Internal consistency
correlation of single item to the total scale/measure of how well a set of items on a test or survey are measuring the same thing/construct / a measure of internal consistency
Split half reliability
divide questions in half into 2 groups (odd numbered/even numbered), calculate score for each half, and compare both halves to determine correlation
Cronbach’s Alpha
Average of al possible split half reliabilities / how closely related a set of items are as a group by comparing the amount of shared variance among the items to the amount of overall variance
three or more response options
Kuder Richardson Coefficient
Average of all possible split-half reliabilities where each question only has 2 answers
Test-Retest Reliability
part of stability – giving identical measure to the same group of participants after time has passed
Interrater Reliability
Part of equivalence: Do multiple raters agree on ratings?
Alternate forms of reliability
Part of equivalence: administer more than one form of a test to the same people - check to see if the forms are equivalent
Validity
does the scale measure what it is supposed to measure?
content validity
does the instrument assess all aspects of the construct and exclude irrevelant issues? e.g. an awe scale w/ good content validity would cover both aspects and exclude irrelevant concepts
face validity
A type of content validity: does the test measure what it appears to measure?
criterion validity
how well a measure correlates with an established measure of comparison (criterion)
what are 2 ways to assess criterion validity?
concurrent and predictive validity
Concurrent validity
does the measure relate to other measures that assess the same thing (does a new measure of gratitude relate to other measures of gratitude)
Predictive validity
does the measure predict future behavior?
construct validity
how well does the scale measure the underlying concept?
What’s the first way to assess construct validity?
Convergent/divergent validity
2nd way to assess construct validity?
factorial validity
3rd way to assess construct validity?
Discriminant validity
convergent/divergent validity
is the measure related (convergent) or unrelated (divergent) to other constructs as one would expect? e.g. A new measure of gratitude is positively related to positive affect and unrelated to intelligence
Factorial validity
Factor analysis shows whether items measure construct in a coherent way – identifying underlying factors/patterns that influence observed variables - group together highly correlated variables - identifying cluster of items tend to be answered the same way
Discriminant validity
Discriminate between groups of people as expected
e.g. A measure of depression distinguishes between those diagnosed with depression and those who have not
Snowball Sample
Current participants help to recruit new ones from acquaintainces/Useful when population members are hard to locate/Hard to make inferences about population
Random Sample
Every member of the population has an equal chance of being selected
Stratified random sample
Individuals selected randomly within each strata (subgroup) of importance
e.g. E.g. divide Skidmore students into first year, sophomore, junior, and senior and randomly select within each subgroup
Convenience Sample
Selecting your sample based upon convenience – participants are readily accessible/ but Sample may not be representative of the population!!
Who launched positive psychology?
Martin Seligman (APA president from 1997-1998)
What does PERMA stand for?
Positive emotions, engagement, positive relationships, meaning, accomplishments
Who are the stoic philosophers who were the forerunners of PP/what did they believe in?
Xeno of Citium and Epictetus
Who built the Early psychological foundations?
William James, Carl Jung, Abraham Maslow, carl Rogers, Victor Frankl
Differences between positive/humanistic psychology?
PP:
- Seligman argues it has a stronger empirical basis
- Prefers quantitative research over qualitative
- Focuses on self and collective well-being/how people flourish in adversity
- Helping people discover and nurture strengths
- Outcome - focused
HP:
- Ignores research on humanistic approaches
- Prefers qualitative research over quantitative research
- Greater focus on self (according to seligman)/quality of experiences
- Helping people achieve their potential
- Process-oriented
Definition of Awe
Awe is the feeling of being in the presence of something vast that transcends your current understanding of the world (keltner, 2023 page 7 of eight wonders reading)
Components of Awe?
Vastness and accommodation/transcendence
Benefits of awe?
- Promotes generosity
- Strengthens positive emotions
- Increasing life satisfaction
- Lower stress
- Enhances immune system
Dark Side of Awe
negative/terrifying experiences producing awe – Hiking up a very steep mountain – scary because it’s so high-up but the views are beautiful
- If someone did something cruel
- Natural disasters – tsunami, earthquakes, forest fires
- Deep ocean and creatures
- Terminal diagnoses
- Mass violence
- A long-term relationship
- Negative awe can be helpful/harmful!
Non self report ways to measure awe
Observer report (Ask others (family, friends, colleagues) to provide ratings)
Facial measures (Code for facial cues using a standardized system (Facial action Coding system))
Physiological measures - heart rate, blood pressure, respiration, skin conductance
Neuroimaging - FMRI
Self report measures for awe
Self-reports / scales / surveys – the most accurate
Physiological measures like heart rate, eye-tracking
(Don’t know if your measuring awe… but accurate way to measure)
Neuroimaging
Change in facial expressions
Showing someone pictures, videos on a vast screen – to measure awe and beauty – compare small versus big screens
Pros and cons of self report measures
Advantages – not very hard to do, deliver your reaction/survey, not expensive - practical logistical reasons
Rich data
Cons - biased, may not give honest answers due to not being comfortable, put down what the researcher wants to hear
What was the Likert type measure/scale and who was it developed by?
Likert Type Measures - developed in 1932 by Rensis Likert for doctoral dissertation
Rating scale to assess thoughts, feelings, behaviors
Respondent given a range of options on dimensions such as:
- How strongly they experience the emotion
- How frequently they experience the emotion
- Level of agreement with the statement
Factors to consider when making the scale
Number of Questions: How many questions do you want your scale to have?
Pros/cons of using a single item measure?
Pros: simple/fast to administer, works if construct is a single dimension, narrow in scope, unambiguous, concrete
Cons: doesn’t assess construct in a comprehensive way / small wording changes or features of response options can bias responses / more items, any idiosyncratic responses will wash out / not sufficient for complex and/or abstract constructs/can’t use it for a bigger concept
For most constructs in positive psychology, multiple item measures are preferable
TRUE
What is the minimum number of items needed for internal consistency?
At least 4 items needed
How does the number of items impact reliability?
Reliability increases above 5 but each additional item makes less impact
if scale is too long, participants may struggle with focus/refuse to complete
How many items is sufficient for most constructs in final scale?
around 5-7 items – may want to create twice as many items in initial survey
Number of points on rating scale? (anchors)
Between 5-7 scale points are optimal
Over what time period do you want participants to reflect upon?
Retrospective reports:
Remember your thoughts, feelings over past month, week, day
Present moment:
- How do you feel at this moment in time
- Experience sampling methodology (ESM) or Ecological Momentary Assessment (EMA)
- Send signals periodically (through cell phone) prompting people to respond to the survey
- ESM - research technique that asks participants to fill out questionnaires multiple times a day over a period of days - goal is to capture real-time data about thoughts/feelings
Also known as EMA - used interchangeably - goal of both methods is to reduce recall bias and enhance ecological validity
Do you want all items to be worded in the same direction or include some worded in the opposite direction?
Same direction?
item clarity: are your items clear and easy for all participants to understand?
Item wording:
- Clear, short (no more than 20 words), unambiguous, written at about a 6th grade vocabulary level, simple rather than compound sentences, relevant to the construct being measured, matters of opinion rather than factual
Items for scales should:
- Cover the entire range of affect or opinion
Avoiding double negatives
2 negatives in the same sentence
e.g. “I am not unsatisfied” // “I am not without any anxiety”
Avoid double barreled items!
items that ask more than one question but allow for only one answer
e.g. “I am sad and anxious” – could be both or sad and not anxious
Avoid leading questions
items that may prompt participants to respond a certain way!
e.g. “The majority of people state they are happy. How often are you happy?”
Who coined the term eugenics in 1883?
Francis Galton (cousin of charles darwin)
What was Galton’s pseudoscientific theory?
that humans could be improved through “selective breeding” (breeding organisms with specific traits to produce offspring with more desirable characteristics)
Why was this theory controversial?
Rooted in prejudiced ideas on race, class, gender —- overemphasized heredity
Racism in psychological testing: who formed the stanford binet intellgience tests in 1916?
Lewis Terman
Why were these IQ tests controversial?
Differences across groups used to justify segregated schools - identify children with learning difficulties and find children with above IQ to place people in certain placements
World war 1:
- Testing used to discriminate against people of color and immigrants
Who was Arthur Jensen?
father of modern racism
Argued IQ was hereditary and certain racial groups performed consistently worse on these tests and these differences indicated a natural hierarchy of intelligence, implying some races were superior to others - misinterpreted IQ as a fixed measure of worth/promoted stereotypes about racial capabilites
Jensen was responsible for resurrecting the idea that the black population is inherently and immutably less intelligent than the white population, an ideology that immediately became known as “jensenism.”
Jensen promoted eugenics as the only practical solution to the problems facing the black community, arguing that they lacked the intelligence necessary for compensatory education programs to be successful.
“[T]here are intelligence genes, which are found in populations in different proportions, somewhat like the distribution of blood types. The number of intelligence genes seems to be lower, over-all, in the black population than in the white.
Who was in support of racist laws:
Henry Garrett (former APA president in 1946)
provided courtroom testimony in support of segregation in 1952
1967 - testified in opposition to civil rights atc
limited opportunities: from the top 6 journals from 1974-2018, what percent of editors were people of color?
5%
People of color have historically been underrepresented in psychological research
Access to Mental Health Care:
In US, people of color are:
- diagnosed with more severe disorders
- Less likely to receive quality mental health care
- Experience more barriers to receiving mental health treatment
MLK Jr. speech to APA
Spoke at APA in 1967
- Outlined science’s responsibility to draw attention to systemic racism
- Notion of adjustment cannot be separated from oppressive social structures
Mindfulness based stress reduction (empirically supported)
Founded by John Kabat Zinn
Goal was to make helpful buddhist teachings and practices available to mainstream U.S
Buddhist roots were briefly acknowledged
Mindfulness framed in secular ways (non-religious ways) to appeal to larger audience like hospitals and clinics - less focus on spiritual roots
Sampling Strategies: Pros of PS 101 participant pool
- Helps support many research projects at universities
- Learning experience for students - better appreciation for the scientific process
- Can make contribution to the discipline they are studying
Ethical considerations of PS 101 participant pool
- Are students able to participate voluntarily or are they being coerced?
- Are there any consequences from withdrawing?
- What are the methodological limitations of using PS 101 students?
WEIRD participants
Western, Educated, Industrialized, Rich, Democratic
Is positive psychology WEIRD
Hendricks et al 2019
Overview/Method
- Searched for randomized Clinical Trial (RCT) studies on positive psychology interventions between 1998-2017
- Included 187 articles in the study
Results
Majority of studies (78%) conducted in western studies
Majority of studies conducted in countries classified as industrialized (71%) highly educated (71%), high income (71%), and democratic (63%)
Increase in publications from Non-Western countries since 2012
Trend toward globalization of positive psychology research
WEIRD participants
2008 study of top psychology journals - 96% of participants from Western Industrialized Countries
Describes only 12% of population
Researchers often mistakenly assume their results can generalize
Differences found between western and non western individuals with variety of psychological constructs
What’s a representative sample?
Participants in the sample are similar on key characteristics to those in the population