Assessments Flashcards

1
Q

Alternate Form Reliability

A

Deals with the evidence as to whether two or more allegedly equivalent forms of the same test are actually equivalent (Popham, 2007, p. 33). Multiple test forms are more often used in high-stakes testing, rather than run of the mill classroom testing. To test for alternate-form reliability, both (or all) forms would have to be administered to the same student(s) with little delay in between. Once you have the scores, you could compute the correlation coefficient reflecting the relationship between students’ performance on the two forms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Alternative assessments

A

Often contrasted with “traditional” assessment. According to Brown & Hudson (1998) the defining characteristics are that alternative assessment–

  1. requires students to perform or create;
  2. uses real world contexts or simulations;
  3. extends day-to-day classroom activity and are non-intrusive;
  4. allows students to be assessed on what they do everyday already;
  5. uses tasks that represent meaningful activity;
  6. focuses on processes as well as product; 7
  7. taps into higher-order thinking and problem-solving;
  8. provides information about student’s strengths and weaknesses;
  9. is multiculturally sensitive
  10. ensures that humans do the scoring
  11. encourages open disclosure of standards and rating criteria;
  12. encourages teachers to perform new instructional and assessment roles.
  13. continuous and untimed

According to Brown (2004, Chapter 10) some forms of alternative assessment include–

  1. Portfolios
  2. Journals
  3. Conferences & Interviews
  4. Observations
  5. Self- and peer- assessment

The effect of testing on teaching and learning (Hughes, 2003, in Brown, 2010, p. 37). The extent to which assessment affects a student’ future language development. It can also refer to the “washing back” of diagnostic knowledge of strengths and weaknesses to the student. Teachers should strive to make classroom tests that enhance positive and effective washback (Brown, 2004, p. 29).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Annual Measurement Achievement Objectives

A

The state standards that indicate that a state or district has met Adequate Yearly Progress

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assessment

A

Appraising or estimating the level or magnitude of some attribute of a a person (Mousavi, 2009).

“Assessment” is not synonymous with “test”. Tests are prepared administrative procedures that occur at a regular time in a curriculum when learners must perform at peak ability. Assessment is much broader and encompasses a wide domain of activities and evaluation. Tests are a subset of assessment.

According to Gottlieb (2006) the assessment of ELLs must be inclusive, fair, relevant, comprehensive, valid, and yield meaningful information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Authentic Assessment

A

A form of assessment in which students are asked to perform real-world tasks that demonstrate meaningful application of essential knowledge and skills (Mueller, 2012)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Circumstantial Bilingualism

A

A situation in which an individual must become bilingual because of an outside force (war, school mandates, relocation, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Construct Related Validity Evidence

A

The extent to which empirical evidence confirms that an inferred construct exists and that a given assessment procedure is measuring the inferred construct accurately (Popham, 2007).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Content Related Validity Evidence

A

Refers to the extent to which an assessment procedure adequately represents the content of the curricular aim being measured (Popham, 2007).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Criterion Referenced Tests

A

Are designed to give test-takers feedback, usually in the form of grades , on specific course objectives. It is possible for everyone to get a good grade if all have mastered the objective(s). They could be formative (units tests, midterm) but they also could be summative (like end-of-course tests). (Brown, 2004)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Criterion Related Validity Evidence

A

The degree to which performance on an assessment procedure accurately predicts a student’s performance on an external criterion (Popham, 2007)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cut score

A

The point on an assessment scale at which scores at or above that point are interpreted or acted upon differently (i.e. 70 passing, 69=failing).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Domain

A

Refers to the four skills - listening, peaking, reading, and writing. Often assessment is seen to evaluate or focus on one domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Elective Bilingualism

A

When an individual chooses to learn a second language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Face Validity

A

The degree to which a test looks and appears to measure the knowledge or ability it claims to measure. It is subjective and based on the examinees’ perceptions, the administrators who use it, others. It is purely in the “eye of the beholder” and cannot be empirically measured. The appearance of content validity increases the probability of face validity (Brown, 2004).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Formative Assessment

A

Evaluating students in the process of forming their competencies or skills. The key to formative assessment is delivery and internalization of appropriate feedback on performance. Performance should inform instruction. Takes into account forms of informal assessment (Brown, 2010)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Grade-equivalent scores

A

In the K-12 context, it is a score that represent the grade level of most students who earn that score. Grade-equivalent scores should be as skill-specific as possible. http://www.hishelpinschool.com/testing/test4.html

17
Q

Internal Consistency Reliability

A

Deals with the extent to which the items in an assessment tool are functioning in a consistent fashion or homogeneously (Popham, 2007). Students should perform consistently on items that assess the same skill.

To test for internal consistency reliability for dichotomous items (two answer choice), the Kuder-Richardson method is used. To determine the internal consistency of polytomous items (items with multiple answer choices), Cronbach’s coefficient alpha is used.

18
Q

mean

A

The mean is the sum of the values divided by the number of values. http://en.wikipedia.org/wiki/Mean

19
Q

Median

A

A median is described as the numerical value separating the higher half of a sample from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one.

20
Q

Multiple Measures

A

Valenzuela (2002) states that one needs to take into account multiple measures of ELLs proficient and development. MM helps one triangulate proficiency better. It enhances construct validity.

21
Q

Norm-referenced teds

A

Each test-taker’s score is interpreted in relation to a mean, median, standard deviation, and percentile rank. The purpose of such tests is to place test-takers along a mathematical continuum in rank order. Norm-referenced tests include the SAT and TOEFL (Brown, 2004)

22
Q

Normal Curve Equivalent

A

Also known as a bell curve

23
Q

Performance-Based Assessment

A

In terms of language assessment, this refers to the type of assessment that involves oral productions, written production, open-ended responses, integrated performance, group performance, and other interactive tasks. This type of assessment is time-consuming and expensive, but have more content-validity because learners are measured in the process of performing linguistic acts (Brown, 2004). PBA is not completely considered a form of alternative assessment but it does have some characteristics in common.

Some characteristics of PBA are-

  1. Students make a constructed response
  2. Student engage in higher-order thinking
  3. Tasks are meaningful, engaging, and authentic
  4. Tasks call for the integration of language skills
  5. Both the process and the product are assessed
  6. Depth of master is emphasized over breadth

PBA should be treated with the same rigor as traditional tests. (Brown, 2004) Teachers need to be careful in assessing PBA that they’re assess language features not just surface features. This may lead to problems with inter-rater reliability.

24
Q

Portfolio Assessment

A

An alternative form of assessment that is a purposeful collection of students’ work that demonstrates their efforts, progress, and achievement in given areas (Genesee & Upshur, 1996).

Examples of items that can be collected in portfolio assessment include: essays with drafts, project outlines, poetry, creative prose, artwork, photos, clippings, audio/video, journals, diaries, reflections, tests, homework, notes on lectures, and self-assessments.

Gottlieb (1995) uses the CRADLE acronym for 6 possible attributes of a portfolio: C - Collecting R - Reflecting A - Assessing D - Documenting L - Linking E - Evaluating

Several reports show the advantages and benefits of portfolio assessment (Genesee & Upshur, 1996 (and others) in Brown, 2004, p. 257) including that portfolios -

  1. create a sense of intrinsic motivation, responsibility, and ownership
  2. promote student-teacher interaction
  3. individualize learning
  4. provide tangible evidence
  5. foster critical thinking
  6. create the opportunity for collaborative work
  7. permit assessment in multiple dimensions of learning

However, a portfolio must not become a “pile of junk” and to prevent this, Brown (2004) suggests that teachers take the following steps -

  1. State objectives clearly
  2. Give guidelines on what material to include
  3. Communicate assessment criteria
  4. Designate time for portfolio development
  5. Establish a schedule for review
  6. Keep portfolios in an accessible place
  7. Provide positive washback before final assessment
25
Q

Psychometrics

A

The field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement. The field is primarily concerned with the construction and validation of measurement instruments such as questionnaires, tests, and personality assessments (Wikipedia).

26
Q

Reliability

A

The consistency with which a test measures what is says it measures (Popham, 2007). Reliability should be synonymous with consistency. The term may also refer to the extent to which students’ test scores are free of measurement error. Popham claims that teachers should NOT calculate reliability of normal classroom tests, but that it is important for teachers to understand the concept of reliability because teachers will often have to explain to others (parents, other teachers, administrators) that test are reliable measures and to perhaps be more critical assessment choices made by districts and states . He also states that the higher the stakes of a test, the more attention should be paid to reliability.

Brown (2004, p. 20) defines a reliable test as one that is consistent and dependable

27
Q

Rubrics

A

A type of scoring guide that is frequently used across content areas and in the assessment of ELLs. Rubrics describe the goals for learning and identify the extent to which students have met them (Gottlieb, 2006, p. 115). Rubrics have numerous useful applications including:

  1. Offering a uniform set of criteria for anchoring the learn and scorer;
  2. Identifiying target benchmarks and the steps to reach it;
  3. Offering a share set of expectations;
  4. Establishing a uniform process for teacher to analyze and interpret student work;
  5. Helping to translate standards into practice;
  6. Offering a focus for instruction and assessment;
  7. Promoting collaboration among teachers;
  8. Providing continuity between grades/levels/classes.
  9. Promoting consensus building among teachers; and
  10. Organizing a framework for reporting results.

Gottlieb (2006, p. 115-116) warns that rubrics can be overused and abused if used incorrectly. They can be misinterpreted as standards or limit learners to certain goals. She explains that SLL rubrics represent a continuum of development while academic proficiency rubrics represent absolute performance. Types of rubrics include checklists, rating scales, holistic scales, task-specific scales, and analytic scales (matrices).

28
Q

Stability Reliability/Test-Retest Reliability

A

Deals with the consistency of test results over time (Popham, 2007,). To test for stability reliability, it is important that no significant event (events that will alter student performance) occur between the two testing occasions. One would determine the correlation coefficient of the scores on the two tests. A high (>.80) coefficient will indicate that the test are quite similar. One may also decide to look at classification consistency instead. Classification consistency deals with categorizing results by looking at how consistently students met a standard (i.e. passing) over two tests. The classification consistency stability would be the percentage of students who passed both times plus the percentage of students who failed both times.

29
Q

Standardized test

A

Presupposes certain standard objectives, or criteria, that are held constant across on form of the test to another. They are large scale tests designed to apply to a broad band of competencies that are not exclusive to one particular curriculum (Brown, 2004).

30
Q

Standards-based test

A

The presupposition of an accepted set of standards on which to base procedure for specific grade levels.

31
Q

Summative Assessment

A

Measuring or summarizing what a student grasps (at the end of the course of instruction). Assess how well a student accomplished objectives, but does not point the way to future progress (Brown, 2010).

32
Q

Validity

A

The extent to which inferences made from assessment results are appropriate, meaningful. and useful in terms of the purpose of the assessment (Brown, 2004). Validity also deals with the extent to which those interpretations are supported by various types of validity evidence (Popham, 2007).

33
Q

Washback effect

A

The effect of testing on teaching and learning (Hughes, 2003). The extent to which assessment affects a student’s future language development. It can also refer to the “washing back” of diagnostic knowledge of strengths and weaknesses to the student. Teachers should strive to make classroom tests that enhance positive and effective washback (Brown, 2004).