NCE Study Flashcards
Types of Validity (Testing)
Content Validity - Do the items on the test comprehensively examine what construct is being measured?
Construct Validity - How well does the test measure what it purports to measure with regard to an abstract trait or psychological notion (within psychology and psychometrics).
Criterion Validity - 2 types
Concurrent Validity - How does this test compare to other measures that purport to measure the intended construct. Generally the test in question is being examined against well-established measures that purport to measure the same construct.
Predictive Validity - How well does the test predict future behavior?
Consequential Validity - Social implications of using tests
Convergent Validity - method used to assess a test’s construct/criterion validity by correlating test scores with an outside source
Discriminant Validity - test will not reflect unrelated variables; no correlation will be found between two constructs that appear to have no connection (E.g. If phobias are unlreated to IQ, then when one correlates client’s IQ scores to their scores on a test for phobias, there shouldn’t be any significant correlation)
Ipsative vs. Normative
Ipsative measures compare traits/scores within the same individual within the same measure (KOIS)
Normative measures compare traits/scores with others either with the same measure or different measures (utilizing statistics such as percentile / percentile rank to show differences )
Power vs. Speed (Testing)
Power tests focuses on raw mastery without focus on the ability to complete the tasks within a time crunch/shortened time frame
Speed tests are normally easier to complete, generally cannot be completed before the “timer” of that test is up, and focuses on efficiency/speed of the task/measure against mastery
Spiral vs. Cyclical (Testing)
Spiral Tests get progressively more difficult
Cyclical Tests utilize “mini” spiral tests within the same measure but on different sections
Validity vs. Reliability
Validity - How well the test/measure/battery is able to measure a given construct
Reliability - How well the test/measure/battery is able to produce consistent/reliable results (e.g. broken scale)
Incremental Validity
used to describe the process by which a test is refined and becomes more valid as contradictory items are dropped
refer’s to a test’s ability to improve predictions when compared to existing measures that purport to facilitate selection in business or educational settings
when a test has incremental validity - it provides you with additional valid information that was not attainable via other procedures
Synthetic Validity
helper or researcher looks for tests that have been shown to predict each job element or component; tests that predict each component (criterion) can then be combined to improve the selection process
How to test Reliability?
parallel forms - each form will have the same psychometric/statistical properties as the original instrument and these parallel forms
In order to establish a reliability correlation coefficient, a single group will take parallel forms of a test and the two sets of scores will be taken into account. Half individuals will get A and the other half will get B initially to control for fatigue, practice, motivation.
test-retest reliability - tests for consistency or “stability” which is the ability of a test score to remain stable or fluctuate over time when the client takes the test again
- generally wait at least 7 days before re-test - only valid for traits such as IQ, which remain stable over time and are not altered by mood, memory or practice effects
split-half method - splitting a test/measure in half by using even items as one test and odd items // OR random number assignment as a second test and correlating them.
In this situation, the individual takes the entire test as a whole and then the test is divided into halves. The correlation between the half scores yields a reliability coefficient. (can’t just go halvsies because it may confound the data due to practice and fatigue effects).’
scorer reliability - (interrater/interobserver) - several raters assess the same performance and is normally utilized with subjective tests such as projectives to ascertain whether the scoring criteria are such that two people will produce roughly the same score
What is a reliability coefficient and what does it represent?
An example .70 reliability coefficient denotes that 70% of the score is accurate while 30% is inaccurate. That is to say that 70% of the obtained score on the test represented the true score on the construct/personality attribute while 30% of the obtained score could be accounted for by error. 70% is true variance while 30% constitutes error variance
How to demonstrate the variance of one factor account for by another?
Square the reliability coeffecient and multiply by 100. If its .70^2 = .49x100 = 49%. Coefficient of determination
Francis Galton’s views on intelligence
intelligence was normally distributed like height/weight and that it was primarily genetic
Charles Spearman
General ability G and Specific ability S which were thought to be applicable to any mental task
Fluid vs Crystallized Intelligence
Fluid intelligence is flexible, culture-free, and adjusts to the situation
Crystallized intelligence is rigid and does not change or adapt
Internal consistency reliability (e.g. Homogeneity) of a test without split-half
Kuder Richardson Coefficients of Equivalence. “Interim consistency” “is each item on the test measuring the same thing as every other item?”
Cronbach’s Alpha
Phenomenon where a person taking an inventory often tries to answer the questions in a socially acceptable manner
Social Desirability (the right way to feel in society); Deviation occurs when the person is in doubt and will give unusual responses
Standard Error of Measurement
denotes how accurate or inaccurate a test score is. If a client decided to take the same test over and over you could plot a distribution of the scores. A low standard error means high reliability.
Test length and its effect on reliability and what test is used to measure the differences pre/post
Increasing test length raises reliability
Shortening test length reduces reliability
(thus increasing or reducing reliability coefficient)
The Spearman Brown formula is used to estimate the impact that this has on a test’s reliability coefficient.
Self-reports are not effective when used in isolation because…
Clients may give inaccurate answers such as those involving social desirability, not wanting to disappoint the counselor, not wanting to reveal difficulty, etc. The report could be biased and this is a “reactive effect” of self-monitoring
What should the client/public know about testing?
That a single test is a single piece of data and it is not infallible. Clients should know the limitations of testing.
How did the Parsian test (Binet’s IQ Test) become americanized?
Lewis Terman, who was associated with Stanford University, thus becoming the Stanford-Binet
What is the Item Difficulty Index? And what is the difference between receiving a 0.0 and a 1.0? Further, what do tests aim for regarding the difficulty index?
The item difficulty index is calculated by taking the number of persons tested who answered the item correctly/ total persons tested. 1.0 indicates that all people who answered a test or test item, answered it correctly; conversely 0.0 indicates that no one answered correctly. Tests generally aim for .5 for difficulty indices.
What is an experiment?
The most valuable type of research. Utilizes treatment controls via the experimenter and randomization/random assignment used in group selection. An experiment attempts to control for/eliminate extraneous variables in order to assess theorized cause and effect relationships.
Quasi-Experiment
Researcher uses pre-existing groups (no Independent Variable alterations). You cannot state with any degree of statistical confidence that the IV caused the DV.
Ex post facto study - “after the fact” - connoting a correlational study or research in which intact groups are utilized
Threats to Internal Validity
Maturation of subjects (psychological and physical changes including fatigue due to the time involved)
Mortality - subjects withdrawing from the study
Instruments used to measure the behavior or trait
Statistical regression (the notion that extremely high or low scores would move toward the mean if the measure if utilized again).
What is Internal Validity?
Was/Were the DV(s) truly influenced by experimental IVs or were other factors impacting/impinging on the theorized relationship?
What is External Validity?
Can the experimental research results be generalized to larger populations (i.e., other people, settings, or conditions?)
Factor-analysis
Statistical procedures that use the important/underlying “factors” in an attempt to summarize a lot of variables
Concerned with data-reduction
E.G. A test which measures a counselor’s ability may try to describe the three most important variables (factors) that make an effective helper, although literally hundreds of factors may exist.
Chi-square
non-parametric statistical measure that tests whether a distribution differs significantly from an expected theoretical distribution
Casual Comparative Design
a true experiment except for the fact that the groups were not randomly assigned
data gleaned from the casual comparative ex-post facto can be analyzed with a test of significant (e.g. t test or ANOVA) just like any true experiment
Ethics regarding participation in an experiment
Subjects are informed of any risks
Negative after effects are removed
Allow subjects to withdraw at any time
Confidentiality of subjects will be protected
Research reports will be presented in an accurate format that is not misleading
Use of techniques in experiment will be those that experimenters are trained in
Standards for n (number of people in an experiment) for a true experiment, correlational research, and a survey
30 subjects to conduct a true experiment
30 subjects per variable for Correlational research
100 subjects for a survey
Organismic Variable
a variable that the researcher cannnot control yet exists such as height, weight, or gender
to determine if it exists, ask if there is an experimental variable being examined which you cannot manipulate
Hypothesis Testing (who pioneered and what is it?)
R.A. Fisher
Hypothesis is an educated guess which can be tested utilizing the experimental model
a statement which can be tested regarding the relationship of the IV and the DV
Null Hypothesis
there will not be a significant difference between the experimental group which received the IV and the control group which did not.
samples will not change even after the experimental variable is applied
The IV does not affect the DV
Experimental/Alternative/Affirmative Hypothesis
suggests that a difference will be evident between the control group and the experimental group
That the IV affected the DV
t test
used to determine if a significant difference between two means exists
“two-groups” or “two randomized groups” research design”
simplistic form of ANOVA
after the experiment is run, researcher goes to a t table; if the t value is lower than the critical t in the table, then you accept the null hypothesis. If the number is higher than the critical t in the table, you can reject the null hypothesis.
independent group design
the change in one group did not influence the other group
repeated measures comparison design
measured the same group of subjects without the IV and with the IV
between-subjects design vs within-subjects design
when a research study uses different subjects for each condition vs. the same subjects (such as in repeated measures) for each condition
Each subject receives only one value of the IV in a between subjects design
Two or more values or levels of the IV are administered to each subject in a within-subjects design
What does P mean in a test of significance?
Probability or level of significance or confidence level
Traditionally, the probability in social science research has been set at .05 or lower (.01 or .001). The .05 level indicates that difference would occur via chance only 5 times out of 100. The significance level must be set before the experiment begins.
P=.05 - differences in the experimental group and the control group are evident at the end of the experiment and the odds are only 1/20 that that this can be explained by chance factors.
Parameter vs. Statistic
technically a value obtained from a population while a statistic is a value drawn from a sample. A parameter summarizes a characteristic of a population (e.g. the average male height)
Type I error vs Type II error
Type 1 Error - Alpha
This occurs when a researcher rejects the null hypothesis when it is true
Type 2 Error - Beta
This occurs when a researcher accepts the null hypothesis when it is false
RA - Reject when true (alpha) Accept when false (beta)
The probability of committing a Type 1 error equals the level of significance mentioned earlier. Therefore, the level of significance is often referred to as the “alpha level”
1 minute beta is called “the power of the statistical test” - power connotes a statistical test’s ability to reject correctly a false null hypothesis
Parametric test vs. nonparametric tests
more power than nonparametric tests
are used only with interval and ratio data
If the significance level changes from .05 to .001 then… what happens to alpha/beta errors?
Alpha errors decrease; beta errors increase
ANOVA - analysis of variance
when doing an experiment with more than two groups
ANOVAs yield an F statistic, which is then compared to that on an F table for critical F.
If F obtained exceeds critical F, then null hypothesis is rejected.
ANCOVA
analysis of covariance
tests two or more groups while controlling for extraneous variables that are called covariates
Kruskal-Wallis
used instead of one-way ANOVA when the data are nonparametric
Wilcoxon signed rank test
used in place of a t test when the data are nonparametric and you wish to test whether two correlated means differ significantly
co - correlated
Mann-Whitney U-Test
determine whether two uncorrelated means differ significantly when data nonparametric
U means uncorrelated
Spearman correlation or Kendall’s tau
used in place of the Pearson r when parametric assumptions cannot be utilized
MANOVA
study has more than one DV
multivariate analysis of variance
Correlation Coefficient
a statistic that indicates the degree or magnitude of relationship between two variables
if the correlation is negative, as one variable goes up, the other goes down. if the coefficient is closer to -1, it is a stronger relationship.
Biserial Correlation
indicates that one variable is continuous (measured using an interval scale) while the other is dichotomous. For example, if you decided to correlate state licensing exam scores to NCC status (here the dichomoty is licensed/unlicensed). If both variables are dichotomous, then a phi-coefficient correlation is necessary.
Describe a “no correlation” or 0.0 correlation coefficient situation
As one variable changes, the other variable varies randomly
Bivariate/Multivariate correlational paradigm
when correlational data describes the nature of two or more variables
Single Blind vs Double Blind Study
the subject doesnt know whether he or she is a member of the control or experimental group but the researcher does; conversely, in a double blind, the researcher doesn’t know where the subjects are assigned to either. the experimenters may also be unaware of the hypothesis for the study.
this helps to eliminate “demand characteristics” - which are cues or features of a study which suggest a desired outcome
demand characteristics
cues or features of a study which suggest a desired outcome
a subject can manipulate and confound an experiment by purposely trying to confirm or disprove teh experimental hypothesis
experimenter effects
might unconsciously communicate his or her intent or expectations to the subjects
Single Subject Research Design
AB, ABA, ABAB
AB or ABA time series - initially popularized by behavior modifiers in teh 1960’s 70’s
AB and ABA studies that rely on continuous measurement
A baseline is secured (A); interview is implemented (B); the outcome is examined via a new baseline (A)
ABAB can be used to better rule out extraneous variables; if ABAB is used and the second AB mimics the first, then the chances increase that B caused the changes to A rather than extraneous variables
ABA and ABAB may be known as withdrawal designs
Pearson R vs Spearman rho
Peareson is used for interval or ratio data while
Rho correlation is used for ordinal data
Content Validity
Do the items on the test comprehensively examine what construct is being measured?
Construct Validity
How well does the test measure what it purports to measure with regard to an abstract trait or psychological notion (within psychology and psychometrics).
Criterion Validity
Concurrent Validity - How does this test compare to other measures that purport to measure the intended construct. Generally the test in question is being examined against well-established measures that purport to measure the same construct.
Predictive Validity - How well does the test predict future behavior?
Consequential Validity - Social implications of using tests
Convergent Validity
method used to assess a test’s construct/criterion validity by correlating test scores with an outside source
Discriminant Validity
test will not reflect unrelated variables; no correlation will be found between two constructs that appear to have no connection (E.g. If phobias are unlreated to IQ, then when one correlates client’s IQ scores to their scores on a test for phobias, there shouldn’t be any significant correlation)
What is the benefit of standard scores such as percentiles, t-scores, z-scores, stanines, or standard deviations over raw scores?
A standard score allows you to analyze the data in relation to the properities of the normal bell shaped curve
When a horizontal line is drawn under a frequency distribution it is known as … ?
the X axis
Abscissa
x-axis
plots the IV - the factor manipulated via the experimenter
Ordinate
y axis
plots the frequency of the DVs
If a distribution is bimodal, then there is a good chance that…
the researcher is working with two distinct populations
If an experiment can be replicated by others with almost identical findings, then the experiment…
is said to be reliable
The range is a measure of variance and usually is calculated by determining the difference between the highest and the lowest score. Thus, on a test where the top score was a 93 and the lowest score was a 33 out of 100, the range would be:
60
but some tests and statistics define the range as the highest score minus the lowest score plus 1. This is known as the “inclusive range” vs the “exclusive range”
“Measures of variability”
Statistics that measure the spread of scores
A sociogram is to a counseling group as a scattergram is to
a. the normal curve
b. the range
c. the correlation coefficient
d. the John Henry Effect
a correlation coefficient
A scattergram/scatterplot is a pictorial diagram/graph of two variables being correlated
Scattergram/scatterplot
a pictorial diagram/graph of two variables being correlated
John Henry Effect (threat to internal validity)
occurs when subjects strive to prove that an experimental treatment that could threaten their livelihood really isn’t all that effective
Example: Counselor educators were asked to use computers as part of the teaching experience but were worried that the computers might ultimately take their jobs! The counselor educators int he comparison control group might purposely spend more time preparing their materials and give students more support than they normally would.
“Resentful Demoralization of the Comparison Group” or “compensatory equalization”
a control group phenomenon that threatens internal validity in research
Comparison group lowers their performance or behaves in an inepct manner because they have been denied the experimental treatment. When this occurs, the experimental group looks better than they should. If the comparison group deteriorates throughout the experiment while the experimental group does not, then demoralization could be noted. This could be measured via a pretest and a posttest.
A counselor educator is teaching two separate classes in individual inventory. In the morning class the counselor educator has 53 students and in the afternoon class she has 177 students. A statistician would expect that the range of scores on a test would be
a. greater in the afternoon class than the morning class
b. smaller in the afternoon class
c. impossible to speculate about without more data
d. nearly the same in either class
assuming there was no other information about the classes, then the range of scores on a test would be greater in the afternoon class than the morning class.
Range generally increases with sample size
Range generally increases with sample size (True or False)
True, more items/people/etc increases variability in scores.
The variance is a measure of dispersion of scores around some measure of central tendency. The variance is the standard deviation squared. A popular IQ test has a standard deviation (SD) of 15. A counselor would expect that if the mean IQ score is 100, then…
Hint: (think bell curve)
68% of the people who take the test will score between 85 and 115
Statistically speaking 68.26% of the scores will fall within plus or minus 1 SD of the mean. 95.44% of scores will fall within plus or minus 2 SD of the mean and 99.74% of the scores fall within plus or minus 3 SD of the mean.
The greater the SD, the great the spread
The standard deviation is the square root of the variance. A z-score of +1 would be the same as:
a. one standard deviation above the mean
b. one standard deviation below the mean
c. the same as a so-called T-score
d. the median score if the population is normal
a. one standard deviation above the mean
Z scores are the same as standard deviations! In fact z-scores are often called standard scores.
Z-score
same as standard deviations; often called standard scores
can be negative or positive. if it is negative then it is “below the mean”
Stanine score
divides the distribution into nine equal parts with 1 the lowest and 9 the highest portion of the curve when looking at a normal distribution
t-score
transformed scores
uses a mean of 50 with each SD as 10
Rogers (Person-Centered)
Individual is good and moves toward growth and self-actualization
Berne (transactional analysis)
Messages learned about self in childhood determine whether person is good or bad, though intervention can change this script
Freud (Psychoanalysis)
Deterministic; people are controlled by biological instincts; are unsocialized, irrational; driv unconscious forces such as sex and aggression
Ellis (Rational-Emotive Behavior Therapy)
People have a cultural/biological propensity to think in a disturbed manner but can be taught to use their capacity
Perls(Gestalt)
People are not bad or good. people have the capacity to govern life effectively as “whole” people are part of their environment and must be viewed as such
Glasser (Reality Therapy)
Individuals strive to meet basic physiological needs and the needs to be worthwhile to self and others. Brain as control system tries to meet needs.
Adler (Individual Psychology)
Man is basically good; much of behavior is determined via birth order
Jung (Analytic Psychology)
Man strives for individuation or a sense of self-fulfillment
Skinner (Behavior Modification)
Humans are like other animals mechanistic and control view and environmental stimuli and reinforcement contingencies; not good or bad ; no self-determination or freedom
Bandura (Neobehavioristic)
Person produces and is a product of conditioning. Social learning theory
Frankl(logotherapy)
Existential theory that humans are good rational and retain freedom of choice