NCE Study Flashcards
Types of Validity (Testing)
Content Validity - Do the items on the test comprehensively examine what construct is being measured?
Construct Validity - How well does the test measure what it purports to measure with regard to an abstract trait or psychological notion (within psychology and psychometrics).
Criterion Validity - 2 types
Concurrent Validity - How does this test compare to other measures that purport to measure the intended construct. Generally the test in question is being examined against well-established measures that purport to measure the same construct.
Predictive Validity - How well does the test predict future behavior?
Consequential Validity - Social implications of using tests
Convergent Validity - method used to assess a test’s construct/criterion validity by correlating test scores with an outside source
Discriminant Validity - test will not reflect unrelated variables; no correlation will be found between two constructs that appear to have no connection (E.g. If phobias are unlreated to IQ, then when one correlates client’s IQ scores to their scores on a test for phobias, there shouldn’t be any significant correlation)
Ipsative vs. Normative
Ipsative measures compare traits/scores within the same individual within the same measure (KOIS)
Normative measures compare traits/scores with others either with the same measure or different measures (utilizing statistics such as percentile / percentile rank to show differences )
Power vs. Speed (Testing)
Power tests focuses on raw mastery without focus on the ability to complete the tasks within a time crunch/shortened time frame
Speed tests are normally easier to complete, generally cannot be completed before the “timer” of that test is up, and focuses on efficiency/speed of the task/measure against mastery
Spiral vs. Cyclical (Testing)
Spiral Tests get progressively more difficult
Cyclical Tests utilize “mini” spiral tests within the same measure but on different sections
Validity vs. Reliability
Validity - How well the test/measure/battery is able to measure a given construct
Reliability - How well the test/measure/battery is able to produce consistent/reliable results (e.g. broken scale)
Incremental Validity
used to describe the process by which a test is refined and becomes more valid as contradictory items are dropped
refer’s to a test’s ability to improve predictions when compared to existing measures that purport to facilitate selection in business or educational settings
when a test has incremental validity - it provides you with additional valid information that was not attainable via other procedures
Synthetic Validity
helper or researcher looks for tests that have been shown to predict each job element or component; tests that predict each component (criterion) can then be combined to improve the selection process
How to test Reliability?
parallel forms - each form will have the same psychometric/statistical properties as the original instrument and these parallel forms
In order to establish a reliability correlation coefficient, a single group will take parallel forms of a test and the two sets of scores will be taken into account. Half individuals will get A and the other half will get B initially to control for fatigue, practice, motivation.
test-retest reliability - tests for consistency or “stability” which is the ability of a test score to remain stable or fluctuate over time when the client takes the test again
- generally wait at least 7 days before re-test - only valid for traits such as IQ, which remain stable over time and are not altered by mood, memory or practice effects
split-half method - splitting a test/measure in half by using even items as one test and odd items // OR random number assignment as a second test and correlating them.
In this situation, the individual takes the entire test as a whole and then the test is divided into halves. The correlation between the half scores yields a reliability coefficient. (can’t just go halvsies because it may confound the data due to practice and fatigue effects).’
scorer reliability - (interrater/interobserver) - several raters assess the same performance and is normally utilized with subjective tests such as projectives to ascertain whether the scoring criteria are such that two people will produce roughly the same score
What is a reliability coefficient and what does it represent?
An example .70 reliability coefficient denotes that 70% of the score is accurate while 30% is inaccurate. That is to say that 70% of the obtained score on the test represented the true score on the construct/personality attribute while 30% of the obtained score could be accounted for by error. 70% is true variance while 30% constitutes error variance
How to demonstrate the variance of one factor account for by another?
Square the reliability coeffecient and multiply by 100. If its .70^2 = .49x100 = 49%. Coefficient of determination
Francis Galton’s views on intelligence
intelligence was normally distributed like height/weight and that it was primarily genetic
Charles Spearman
General ability G and Specific ability S which were thought to be applicable to any mental task
Fluid vs Crystallized Intelligence
Fluid intelligence is flexible, culture-free, and adjusts to the situation
Crystallized intelligence is rigid and does not change or adapt
Internal consistency reliability (e.g. Homogeneity) of a test without split-half
Kuder Richardson Coefficients of Equivalence. “Interim consistency” “is each item on the test measuring the same thing as every other item?”
Cronbach’s Alpha
Phenomenon where a person taking an inventory often tries to answer the questions in a socially acceptable manner
Social Desirability (the right way to feel in society); Deviation occurs when the person is in doubt and will give unusual responses
Standard Error of Measurement
denotes how accurate or inaccurate a test score is. If a client decided to take the same test over and over you could plot a distribution of the scores. A low standard error means high reliability.
Test length and its effect on reliability and what test is used to measure the differences pre/post
Increasing test length raises reliability
Shortening test length reduces reliability
(thus increasing or reducing reliability coefficient)
The Spearman Brown formula is used to estimate the impact that this has on a test’s reliability coefficient.
Self-reports are not effective when used in isolation because…
Clients may give inaccurate answers such as those involving social desirability, not wanting to disappoint the counselor, not wanting to reveal difficulty, etc. The report could be biased and this is a “reactive effect” of self-monitoring
What should the client/public know about testing?
That a single test is a single piece of data and it is not infallible. Clients should know the limitations of testing.
How did the Parsian test (Binet’s IQ Test) become americanized?
Lewis Terman, who was associated with Stanford University, thus becoming the Stanford-Binet
What is the Item Difficulty Index? And what is the difference between receiving a 0.0 and a 1.0? Further, what do tests aim for regarding the difficulty index?
The item difficulty index is calculated by taking the number of persons tested who answered the item correctly/ total persons tested. 1.0 indicates that all people who answered a test or test item, answered it correctly; conversely 0.0 indicates that no one answered correctly. Tests generally aim for .5 for difficulty indices.
What is an experiment?
The most valuable type of research. Utilizes treatment controls via the experimenter and randomization/random assignment used in group selection. An experiment attempts to control for/eliminate extraneous variables in order to assess theorized cause and effect relationships.
Quasi-Experiment
Researcher uses pre-existing groups (no Independent Variable alterations). You cannot state with any degree of statistical confidence that the IV caused the DV.
Ex post facto study - “after the fact” - connoting a correlational study or research in which intact groups are utilized
Threats to Internal Validity
Maturation of subjects (psychological and physical changes including fatigue due to the time involved)
Mortality - subjects withdrawing from the study
Instruments used to measure the behavior or trait
Statistical regression (the notion that extremely high or low scores would move toward the mean if the measure if utilized again).
What is Internal Validity?
Was/Were the DV(s) truly influenced by experimental IVs or were other factors impacting/impinging on the theorized relationship?
What is External Validity?
Can the experimental research results be generalized to larger populations (i.e., other people, settings, or conditions?)
Factor-analysis
Statistical procedures that use the important/underlying “factors” in an attempt to summarize a lot of variables
Concerned with data-reduction
E.G. A test which measures a counselor’s ability may try to describe the three most important variables (factors) that make an effective helper, although literally hundreds of factors may exist.
Chi-square
non-parametric statistical measure that tests whether a distribution differs significantly from an expected theoretical distribution
Casual Comparative Design
a true experiment except for the fact that the groups were not randomly assigned
data gleaned from the casual comparative ex-post facto can be analyzed with a test of significant (e.g. t test or ANOVA) just like any true experiment
Ethics regarding participation in an experiment
Subjects are informed of any risks
Negative after effects are removed
Allow subjects to withdraw at any time
Confidentiality of subjects will be protected
Research reports will be presented in an accurate format that is not misleading
Use of techniques in experiment will be those that experimenters are trained in
Standards for n (number of people in an experiment) for a true experiment, correlational research, and a survey
30 subjects to conduct a true experiment
30 subjects per variable for Correlational research
100 subjects for a survey
Organismic Variable
a variable that the researcher cannnot control yet exists such as height, weight, or gender
to determine if it exists, ask if there is an experimental variable being examined which you cannot manipulate
Hypothesis Testing (who pioneered and what is it?)
R.A. Fisher
Hypothesis is an educated guess which can be tested utilizing the experimental model
a statement which can be tested regarding the relationship of the IV and the DV
Null Hypothesis
there will not be a significant difference between the experimental group which received the IV and the control group which did not.
samples will not change even after the experimental variable is applied
The IV does not affect the DV
Experimental/Alternative/Affirmative Hypothesis
suggests that a difference will be evident between the control group and the experimental group
That the IV affected the DV
t test
used to determine if a significant difference between two means exists
“two-groups” or “two randomized groups” research design”
simplistic form of ANOVA
after the experiment is run, researcher goes to a t table; if the t value is lower than the critical t in the table, then you accept the null hypothesis. If the number is higher than the critical t in the table, you can reject the null hypothesis.
independent group design
the change in one group did not influence the other group
repeated measures comparison design
measured the same group of subjects without the IV and with the IV