Section B: Scientific Processes - A+I Reliability & Validity Flashcards
What is meant by the term ‘reliability’?
Reliability refers to the CONSISTENCY of data findings.
What are the THREE ways of assessing reliability?
-Inter-observer reliability (external - whether the data is consistent.) - DATA
-Split-Half (internal consistency)
-Test retest (external) - DATA
What does the term ‘internal reliability’ refer to?
Internal reliability assesses the consistency of results across items within a test.
What does the term ‘external reliability’ refer to?
External reliability refers to the consistency of the results and the extent to which a measure varies from one use to another – are the results consistent over time?
Which method ASSESSES INTERNAL RELIABILITY?
The Split-Half Method.
What does the split-half method refer to?
-What does it refer to?
-What does it measure?
The split half method refers to the INTERNAL CONSISTENCY OF QUESTIONNAIRES AND TESTS SUCH AS PSYCHOMETRIC TESTS.
It measures THE EXTENT to which all parts of the test CONTRIBUTE EQUALLY TO WHAT IS BEING MEASURED.
What are the FOUR steps behind the split-half method?
- SPLIT A TEST INTO TWO HALVES (of 10). For example, one half may be composed of even-numbers while the other half is composed of ODD-NUMBERED questions.
- Administer each half TO THE SAME INDIVIDUAL (Qs 1-10 = first half, 11-20 = second half).
- REPEAT FOR A LARGE GROUP OF INDIVIDUALS (one participant does first half, followed by second half - REPEAT WITH LARGE SAMPLE).
4 - Look for a positive correlation between the scores for both halves. A CORRELATION OF ROUGHLY 0.8 would be SIGNIFICANT and indicate HIGH internal reliability.
When does the ‘split-half’ technique work best?
This technique works best when there are an EVEN number of questions within a ‘test’, but also when the questions on the test measure the SAME construct (eg Authoritarian personality) or knowledge area.
Which method assesses EXTERNAL RELIABILITY? What does it specifically assess?
The TEST-RETEST method –> assess the external consistency of a test and it measures the stability of a test over time.
Why might test-retest be particularly useful for clinical psychologists?
This method is especially useful for tests that measure stable traits or characteristics that aren’t expected to change over short periods.
If it wasn’t for the reliability of such tests some individuals may not be successfully diagnosed with disorders such as depression and consequently will not be given appropriate therapy.
Why might the timing of a retest be important? What if too little time has lapsed or too much time has lapsed?
The disadvantage of the test-retest method is that it takes a long time for results to be obtained. The reliability can be influenced by the time interval between tests and any events that might affect participants’ responses during this interval.
The timing of the re-test is important; if the duration is too brief, then participants may recall information from the first test, which could bias the results.
Alternatively, if the duration is too long, it is feasible that the participants could have changed in some important way which could also bias the results.
What would a typical test-retest assessment involve?
A typical assessment would involve giving participants the same test on two separate occasions.
If the same or similar results are obtained then external reliability is established, so results from both occasions would be compared and correlated.
What sort of correlation would indicate consistency between the two sets of results?
STATISTICAL TESTING CAN BE USED TO HELP DETERMINE IF THE TEST HAS INTERNAL AND EXTERNAL RELIABILITY - A CORRELATION OF ROUGHLY 0.8 IS SAID TO BE SIGNIFICANT.
i) What is meant by ‘inter-rater/ observer reliability’?
ii) What is the process behind checking for inter-rater reliability? If data is similar, what can be concluded (what type of reliability is there?)
Inter-rater reliability refers to the DEGREE TO WHICH DIFFERENT RATERS GIVE CONSISTENT ESTIMATES OF THE SAME BEHAVIOUR.
It is when a single event is measured simultaneously and independently by two or more trained individuals. If data is SIMILAR, then it has external reliability.
Why is it important to have OPERATIONALISED categories for inter-observer reliability checks?
-If two researchers are observing ‘aggressive behaviour’ of children at nursery they would both have their own subjective opinion regarding what aggression comprises. In this scenario, it would be unlikely they would record aggressive behaviour the same and the data would be unreliable.
However, if they were to OBSERVE the behaviour category of aggression this would be more objective and make it easier to identify when a specific behaviour occurs.
For example, while “aggressive behaviour” is subjective and not operationalized, “pushing” is objective and operationalized. Thus researchers could simply count how many times children push each other over a certain duration of time and results could then be compared.
Identify two other methods where inter-rater reliability testing would be important. Explain why?
CONTENT ANALYSIS —> multiple coders may be involved in analysing materials (articles etc). Inter-rater reliability ensures that different coders are consistently applying the coding scheme and interpreting the content in a reliable and therefore consistent manner.
IN QUESTIONNAIRES USING THE SPLIT-HALF METHOD –> Inter-rater reliability is used to determine the correlation co-efficient between the scores of the two sets to determine the degree of agreement or consistency between them.
What sort of correlation would indicate consistency between two researchers?
Statistical testing can be conducted on these correlations to determine the STRENGTH of the agreement and if the test has internal and external reliability.
-A CORRELATION OF ROUGHLY 0.8 IS SAID TO BE SIGNIFICANT.
If agreement is not found in terms of internal and external reliability, researchers will seek to improve the reliability of their test.
How can researchers IMPROVE THE RELIABILITY OF A PROCEDURE? (think scripts).
1 - Think SCRIPTS –> Procedures have higher reliability when they are STANDARDISED and WELL-DOCUMENTED with instructions of how to REPLICATE the procedure.
E.g. Pre-recorded instructions (standardised).
-This enhances reliability as it allows other researchers to replicate the same study in different environments to check for the consistency of findings.
How can researchers IMPROVE THE RELIABILITY OF A PROCEDURE? (think environment and control).
2 - Think ENVIRONMENT AND CONTROL –> Lab experiments tend to have higher reliability than other experimental methods due to the well-controlled environment and subsequent STRICT CONTROL and LIMITING OF EXTRANEOUS VARIABLES.
-Therefore, researchers can be confident that it is the IV (that is being manipulated) which is having an effect on the DV as all other extraneous variables (later confounding) are controlled.
Control could also be improved through the use of a control group.
How can researchers IMPROVE THE RELIABILITY OF A PROCEDURE?(think ETHICS)
3 - Think ETHICS –> By obtaining informed consent from participants, maintaining their confidentiality, and minimising the risk of psychological harm, researchers take into account ethical considerations.
Open and transparent reporting of the study’s aims, methodology and results in a debrief, or when gaining informed consent contributes to the reliability and integrity of psychological research.
How can psychologists improve the reliability of OBSERVATIONAL RESEARCH?
Think in terms of BEHAVIOURAL CATEGORIES –> Clearly OPERATIONALISE THE BEHAVIOURAL CATEGORIES (how you will measure this behaviour).
For example, verbal aggression could be measured based/categorised on the number of times a child swears or insults somebody else in the observation.
Once this is completed, what could researchers do before the study?
-Behavioural categories could be pre-defined (specific and operationalised) using a TOP-DOWN approach in which the categories for data are imposed before the research begins.
E.g. verbal aggression could be split into categories PRIOR to the observation taking place such as ‘SWEARING’ and ‘HARSH INSULTS’ rather than using a bottom-up approach which allows categories to emerge from the content.
How can psychologists improve the reliability of OBSERVATIONAL RESEARCH? (Think in terms of equipment!)
Think in terms of equipment —>Randomisation of materials should be done - randomisation is the process of making groups of items random (in no predictable order), like shuffling cards in a card game.
It might also refer to the presentation of trials in an experiment to avoid any systematic errors that might occurs as a result of the order in which the trials take place. This reduces bias as the researcher has no control over the order of items / trials.
Equipment should be kept the same in observations (aided by the use of top-down approach) to ensure consistency in how behaviours are identified and recorded.
How does having multiple researchers perform the same study with the same behavioural categories allow for inter-rater reliability to be increased?
Allows inter-rater reliability to be increased as there is LESS ROOM FOR SUBJECTIVITY AS CATEGORIES ARE CLEARLY DEFINED AND OPERATIONALISED which allows testing for consistency of results through replication.
-Using standardised observation protocols and conducting multiple observations (e.g. to test for INTRA-RATER/ INTER-RATER reliability can help establish consistency of findings through replication - a valid method to assess the reliability of a study.
*How can psychologists improve the RELIABILITY of questionnaires?
1 - Think in terms of the WEIGHTING OF QUESTIONS:
-Weighting of questions refers to ASSIGNING DIFFERENT LEVELS OF IMPORTANCE/VALUE TO INDIVIDUAL QUESTIONS BASED ON THEIR RELEVANCE to the research topic or construct being measured.
-This means that certain questions may carry more weight or significance in determining the overall score/outcome of the questionnaire.