First Exam Flashcards

Question

What is a key part of parallel forms reliability?

Answer 1

Developing a large number of items and then randomly divide them into test

Answer 2

How correlated the scores are of a persons taking similar tests with two different forms of

Answer 3

They should be administered at least 2 weeks apart

Answer 4

There is significant measurement error

Answer 5

test may reflect state rather than trait and you will not have a statically significant difference

Answer 6

How related items are within the entire scale and within the subscales

Answer 7

The content should be similar for the reliability to be high, you need adequate number of items and want the item to underlie appropriately a particular construct

Answer 8

Split half reliability, Kudar Richardson #20 (KR 20), Cronbach Alpha

Answer 9

Split the examinees scores into halves and then correlate the scores of both halves

Answer 10

May produce artificially high internal consistency for odd and even split, if he/she runs out of time

Answer 11

They will take the odd questions and split those in half with the even questions, this will allow a better idea of split half reliability

Answer 12

Natural order of test taking (content is not the same with the first half as the second half) & Issue of a timed test (some people don’t get to the second half)

Answer 13

Formula that allows for split half reliability that is done under the assumption that the questions are scrambled

Answer 14

By stopping the natural order

Answer 15

Only works with dichotomous scaling systems (only allows for right or wrong question responses)

Answer 16

Can be used to assess internal consistency for those tests that have different scoring systems

Answer 17

Can be used on any scoring system and allows for scrambling of the questions, used more than any other measure of internal consistency, equivalent to all split half correlations

Answer 18

High coefficient alpha does not always mean that you are measuring only one factor or latent construct (unidimensionality)

Answer 19

We assume there is unidimensionality but more tests are inadvertently dimensional or multidimensional

Answer 20

It can mean more than one factor is being measured (ex. AP history test measures knowledge, but also writing ability)

Answer 21

If test takers are homogeneous group, need heterogeneity in the group (it will be more accurate if it is a general group of people)

Answer 22

Assessing the degree of consistency between multiple raters

Answer 23

Kendall’s coefficient of concordance & Cohens Kappa

Answer 24

Degree of consistency amongst raters that rank order people/objects Rank order consistency: miss universe, rank people in an order of 1,2,3,4,5 among different judges to see if they correlate with another

Answer 25

Degree of consistency amongst raters that classify items into discrete categories

Answer 26

Assessing the same group of 30 people between two different raters, cohens kappa will identify who places which patients in depressed or not depressed

Answer 27

The probability that an observation under the normal curve lies within 1 SD of the mean is approx 0.68 & 2 SD of the mean is approx 0.95 & 3 SD of the mean is approx. 0.99

Answer 28

SEM is based on the idea that your cannot test an individual infinite amount of times. Standardized error is always present

Answer 29

It is used to determine a level of sensory difference (like hearing or sight) and it is variability in the expression of disorders in humans

Answer 30

Examining the item quality to map the construct we have defined. We then look at dichotomous and no dichotomous measures to determine item quality (variance, covariance, etc.)

Answer 31

Need to determine what area or domain you want to examine; homogenous content; tests made for repeated use require validation

Answer 32

Unidimensional, subject centered methods, stimulus centered methods and response centered approaches

Answer 33

Test developers primary interest is locating the individual at different points on the continuum (likert scale)

Answer 34

Psychophysics & JND- give tones to determine what is the absolute threshold to experiencing a sensation, but not always clear where the difference lies, not all of us agree on what the difference is. Need subject competency to tell the JND

Answer 35

Each respondent is asked to rank order his or her preference for a set of stimuli or to rank order a set of statements in terms of their proximity to his or her own personal beliefs. Allows to scale psychological distance between separated categories

Answer 36

Difference in character or content

Answer 37

Difference in character or content

Answer 38

Same character or content

Answer 39

Multiple studies with the same research questions

Answer 40

Split a variable into 2 parts

Answer 41

Take sample data and make inferences on the population

Answer 42

Looks at trends in the sample and understand them based on the sample itself

Answer 43

Is an overall-testing score in the context of history (holistic)

Answer 44

Is a quantitative score no larger context

Answer 45

Create, seek out and end up in environments that reinforce your traits, do this consciously and unconsciously

Answer 46

As reliability of the instrument increases the standard error of measurement goes down, if you know your test is getting consistency then of course your error will go down

Answer 47

They are trying to determine if a specific skills set or knowledge base has been acquired

Answer 48

Learned that you cannot use traditional reliability since you are not interested in how someone does in comparison to a group of others— you are interested in how someone performs in regard to a specific criterion

Answer 49

Anything that has real world implication Ex: a lawyer fails bar exam they cannot become a lawyer, these affect your real like because they affect you moving forward in a profession

Answer 50

Relative position of the examinees score in a distribution of scores (z score) & the degree to which the person has attained the goal of a specific instruction (ex: comp exam)

Answer 51

Measured in terms of standard deviations from the mean, Relative position of the examinees score in a distribution of scores

Answer 52

Percentage of correct answers from a randomly determined number of test items (you don’t need to know how others performed if you know the percentage of correct answers obtained)

Answer 53

Look at development (all of these tests are arbitrary), a test that measures a student's performance against a set of predetermined standards or criteria.

Answer 54

The proportion of items in the domain that the examinee answers correctly

Answer 55

Cutoff score that classifies examinees into two categories master vs. non-master (ex: EPPP)

Answer 56

Allows for comparison across variables that are calibrated or scaled differently, it is independent of scaling and calibrated

Answer 57

These both have different scoring which makes them not comparable, but the z-score is able to compare them

Answer 58

Using an examinees score as mean score as a representation of his true universal score

Answer 59

By summing all the error variance

Answer 60

The lower the error the better the examinees score represents his domain referenced true knowledge

Answer 61

-does the observed match on to what we predict, we want to know who passes and fails as well as how they are classified

Answer 62

What we observe to happen

Answer 63

Is what has happened

Answer 64

The reliability of achievement tests

Answer 65

It is in the middle, not the tails of the scoring, we want the middle to be lower to show a better reliability

Answer 66

Yes and all similar

Answer 67

They have less reliability compared to heterogeneity

Answer 68

Literal meaning and pragmatic meaning

Answer 69

Semantic understanding of sentence structure

Answer 70

Inferences about the questions intent

Answer 71

Ex: how are you doing? Leads to interpretation by the participant in the conversation, this can cause issues with reliability because the client may interpret the question differently

Answer 72

When asked to respond to something that occurred last week vs. last year, find differential responding

Answer 73

There is an interpretation that the shorter the length implies frequency and longer the event more intensity

Answer 74

Respondents change their answers based on researchers affiliation, or response categories themselves can change the way a patient may respond

Answer 75

Preceding questions in a survey or questionnaire influences the ways in which respondents evaluate items

Answer 76

Internet provides a cheaper and faster way to update tests, translate tests, interpret scores quickly, can get more respondents quickly, can provide access to test materials quite cheaply, allows those in rural areas to be tested

Answer 77

Test security, keeping the testing items secured, test may discriminate, language barriers, minors taking tests, not giving informed consent accurately, how do you give feedback to individual, how do you deal with emotional trauma from results

Answer 78

Tests whose validity and reliability have been established for the population being tested

Answer 79

Done through statistical analysis of the test questions

Answer 80

qualities that are inherent to something or someone, and are not dependent on external circumstances

Answer 81

Multidimensional has lower cronbachs alpha, uni has higher

First Exam Flashcards

(105 cards)