Lecture 2 Flashcards

Question 1

Q

Remember Correlations

Answer

A

An important stat in test creation and measures of worthiness
-1 to 1
Linear
Line of best fit
Curvilinear
Assumptions underpinning correlation – why?
How does variance impact on the coefficient?
What about the shape of the line?

Question 2

Q

Shared Variance (Squared coefficient)

Answer

A

What are the factors that contribute to the shared variance?

What is not shared?

Question 3

Q

Reliability

Answer

A

Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? It’s about consistency & dependability
Depends on construction of test & environment administered in
There’s no perfect test or environment so will always be some error but we want to minimise it.
Reliability usually reported as a correlation coefficient
Different types of tests tend to have different levels of reliability e.g., well constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70) because the concept is abstract & potentially fluctuates
Many ways of measuring reliability including test-retest, alternative forms , consistency of measurement

Question 4

Q

Test-retest reliability

Answer

A

Give test twice to same group, usually a couple of weeks apart.
Correlate results
Higher the correlation, more reliable the test
Results can fluctuate depending on things such as the time between test being taken, forgetting information, might learn more about test contents by studying in the interim, more familiar with test format second time around
Is less likely to fluctuate if testing a stable construct (e.g., IQ)

Question 5

Q

Alternative, parallel or equivalent forms reliability

Answer

A

Making two or more versions of the same test
Stops issues like people remembering or studying particular answers between test-retest
Hard to make tests equal in terms of content & levels of difficulty, or ensuring administration was exactly the same
Test developer must demonstrate the versions are truly parallel

Question 6

Q

Reliability as Internal consistency

Answer

A

Measures how test items relate to each other & the test as a whole
Looks within the test to measure reliability
E.g., test to measure of anxiety – respondents should answer items that tap aspects of anxiety in a similar way
Most common forms of internal consistency (i. e., reliability) are split-half and Cronbach’s alpha

Question 7

Q

Split-half reliability

Answer

A

Use one form of the test administered at the same time. Split the test in two and correlate the scores
Issues: test may get harder as you go along so first half is not equal to second
May compare odd and even numbered questions but still the halves may not be equal & the test is shorter which can decrease reliability
Can use Spearman-Brown equation to compensate for shorter test (rx2/1+r)

Question 8

Q

Cronbach’s coefficient alpha & Kuder-Richardson

Answer

A

Try to rate internal consistency by estimating reliability of all possible split-half combinations by correlating each item with the total and averaging
Kuder-Richardson used with forced-choice format tests

Question 9

Q

Validity

Answer

A

Validity: The extent to which all the available evidence supports that the test is actually measuring what it is intended to measure.
It is a central requirement for test without which the test items/tasks would not have meaning
Content validity
- Face validity
Construct validity
- Criterion-related validity
- Predictive validity

Question 10

Q

Content Validity

Answer

A

The content of a test reflects what the test is aiming to measure
Sometimes content validity is enough validity for a test
Not enough to ascertain test validity beyond achievement type tests e.g., for more abstract constructs

Question 11

Q

Example of content validity

Answer

A

Your results on the end of semester exam should reflect what you know about what has been covered in this unit.
So, there should be questions that equally represent the concepts introduced
- e.g., construct underrepresentation; a failure to capture an important aspect or overrepresentation.
The questions should be worded in a way that is consistent with the language concepts are taught in

Question 12

Q

Face validity

Answer

A

It refers to the look of the test but maybe superficial.
- E. g., the items in the test look as though they ought to measure what you are aiming to measure
Some tests may look valid and not be and others not look valid but are.

Question 13

Q

Construct validity

Answer

A

Constructs are theoretically driven ways of talking about certain features in the world
Construct validity asks how well a test can give a construct meaning
- e.g., anxiety – only exists in so much as the construct represents a set of behaviours, thoughts and feelings
- Construct irrelevance: Scores are influenced by something other than what the test is supposed to measure e.g., anxiety or illness impacting exam score
Scientific evidence demonstrating that the construct (model, concept, idea, notion) is actually being measured by the test.
Most important when developing tests to measure abstract constructs like depression, anxiety, happiness, love, empathy.
Measured with statistical tools and methods…

Question 14

Q

Criterion-Related validity

Answer

A

What is the relationship between the criterion (another standard) for the test and the test scores?
Concurrent validity
- What is the evidence that my test compares well (e.g., highly correlated) with the results of some other way of knowing (e.g., corroboration)
-Relates to what is known at this point in time

Question 15

Q

Convergent validity

Answer

A

If you think your test measures post-traumatic growth, you would expect it to be related to other instruments designed to measure positive post-trauma perception (e.g., SRG & Thriving)
You don’t want a perfect correlation with other tests as that would make yours redundant.
Depending on how close the theoretical connection is between the tests, the coefficient will vary.

Question 16

Q

Discriminant validity

Answer

A

As with convergent validity, discriminant validity uses other established tests to test construct validity.
This time, you are looking for little or no correlation between your measure (e.g., IES-R & PTGI).
Knowing what you are dealing with may have significant clinical influence (e.g., Shakespeare-Finch & de Dasell, 2009).

Question 17

Q

Predictive validity

Answer

A

If a standard to compare to is not available, interest turns to Predictive validity
-The relationship between the test scores now and a standard in the future.
-For example, OP scores and first year academic success – depends on age…
-OP for school leavers better predictor than learning strategies which is more predictive for mature age
-Combining evidence often offers more predictive validity e.g., OP. introversion and learning strategies in school leavers.

Question 18

Q

Cross-cultural fairness

Answer

A

The idea that ethnicity, gender, class, background etc impact on results
Lots of laws passed in the US about tests being culturally fair so as not to repeat errors of the past which disadvantaged minority groups e.g., employment tests must be able to demonstrate the test is relevant to the job sought
E. g., What number comes next in the sequence, one, two, three, __________?
Mong or yuur mong
As wallaby is to animal so cigarette is to _______
kuuk thaayorre of Edward River

Question 19

Q

Practicality

Answer

A

Choosing the right test to administer; time, format, cost etc
Time
-Time taken to do test needs to reflect person targeted e.g., attention span, age, time available to do the test
Cost
-Many tests cost & some cost a lot. Balance cost of test with reliability of it and level of need to take it.
Format
-Types of questions, font size, layout, MC lower anxiety but not always culturally fair (e.g., white males & MC)
Readability
-Test items be reviewed for readability – our school
Ease of administration, scoring & interpretation
-Understanding the test manuals
-How many people are taking the test & does this impact on ease of administration?
-Level of training needed to administer, score & interpret the results
-How long will scoring take and how long will report take?
-Time needed to explain results to test taker
-Other materials like publisher’s preformatted sheets

Question 20

Q

Selecting & administering a good test

Answer

A

There are 000s of tests – how to choose?
What are the goals of the client or researcher?
Which tests can achieve that goal?
Sourcing tests through articles, books, publisher’s catalogues, test library, online
Major test categories include:
-IQ, aptitude, achievement, behaviour, development, personality, neuropsychological, science, sensory perception, speech, hearing…
Examine research about the test such as its validity & reliability data
-PCA/EFA & CFA
-Different forms of reliability testing
-Different forms of validity testing
Based on all the information and steps outlined, make a balanced & informed choice

Question 21

Q

Making meaning out of raw scores

Answer

A

Raw scores have not been manipulated in any way
They mean nothing without putting them in context so might compare the score an individual gets with the ‘normed score’
-How did the client fair next to others from the same type of group who have taken the test before?
-Compare people from 2 groups e.g., using percentiles
-Compare results for one person on 2 or more tests – sometimes discrepancies between these scores indicate an impairment of sorts

Question 22

Q

Frequency distributions

Answer

A

What was the score and how often did it occur? Listing scores in numerical order you can easily see if this person scored higher or lower than most on the distribution
Might list scores or groups of scores
Histograms & frequency polygons etc assist in getting an overview of the data. We can learn a lot about the data from its shape

Question 23

Q

Standard Scores

Answer

A

Percentiles are commonly used (e.g., 75th percentile)
z-scores are standard scores
The mean always = 0 and the SD = 1
Z = the score, less the mean, divided by the standard deviation and is therefore sensitive to all components of the variance equations including sample size.

Question 24

Q

T scores

Answer

A

Another standardised score
T-scores are often used in personality testing
M = 50 and SD = 10
T = z(SD)+M

Question 25

Q

Quantitative

Answer

A

Deductive
Positivist 
Realist
Objective
Reductionist
Generalisation
Numbers

Question 26

Q

Qualitative

Answer

A

Inductive
Interpretive
Constructivist
Subjective
Holisitc
Uniqueness
Words

Question 27

Q

A word about qualitative rigor

Answer

A

Qualitative research has different ways to investigate the quantitative equivalent to reliability and validity - trustworthiness e. g., triangulation.
Qual. methods can be used to support quant. measures e. g.,
They can also be used to inform quant studies
e. g., Shakespeare-Finch, J., Wehr, T., Kaiplinger, I., & Daley, E. (2014). Caring for emergency service personnel: Does what we do work? Proceedings of the Australia & New Zealand Disaster & Emergency Conference, Gold Coast (QLD), 5th- 7th May 2014
BUT – qual and quant are underpinned by very different philosophical assumptions
e. g., explore construct meaning/measure constructs
Idiographic/nomothetic