Measuring Learning Flashcards

Question

Student surveys - purpose

Answer 1

measure students’ perceptions of teacher quality

Answer 2

Length: – Full: 67 questions (elementary) or 92 questions (secondary) – Lite: 36 questions (elementary and secondary) • Structure: 7 “Cs” 1. Care (does the teacher care about the student?) 2. Control (is the teacher in control of the classroom?) 3. Clarify (does the teacher clarify difficult concepts?) 4. Challenge (does the teacher challenge students?) 5. Captivate (does the teacher keep students’ attention?) 6. Confer (does the teacher engage students in discussions?) 7. Consolidate (does the teacher recap/review material?)

Answer 3

Not organized by “Cs” to avoid “priming” students Short, age-appropriate statements that children can understand Some statements are reverse-coded to contribute to the score for each “C” Likert scale for children to indicate the extent to which they agree with the statement (sometimes expressed in terms of frequency)

Answer 4

systematize principals’ perceptions of teacher quality

Answer 5

``` – Overall teaching effectiveness – Dedication and work ethic – Organization – Classroom management – Raising student achievement (in math and reading) – Role model for students – Student satisfaction with teacher – Parent satisfaction with teacher – Positive relationship with colleagues – Positive relationship with administrators ```

Answer 6

Principals can predict teacher effectiveness with a single question on their overall effectiveness. Principals are skeptical of identifying poor performers even when there are no stakes

Answer 7

measure teachers’ content knowledge, subject-specific | pedagogical knowledge, or understanding of student errors

Answer 8

measure quality of school management (usually in the context of interventions to improve governance)

Answer 9

developed by Bloom, Lemos, Sadun, Van Reenen (2015) ``` • Management quality measured on: – Operations – Monitoring – Target Setting – People Management ``` Instrument recently adapted for developing countries by creating finer gradations in the 5-point scale

Answer 10

They produce measures of effort that effectively predict student achievement Principal surveys of teacher effort are remarkably predictive of teacher value added, even though there is reluctance to identify weak performers. It is management indices (not teacher effort) that tend to cluster at low levels of the scale. We have no evidence of whether or not they predict teacher knowledge.

Answer 11

It may be subject to ceiling effects where the distribution is censored at higher levels of achievement Our literacy test doesn’t measure potential negative side effects of the incentive program The test may not be able to pick up any differences among the highly literate students. Efforts to improve reading may increase at the expense of time teaching another subject and we are not measuring that. Oral tests can indeed be adaptive (unlike paper-pencil tests).

Answer 12

• Appropriate to the context – Major need for piloting, adaptation of instruments • Measures what we think it measures – We want to measure learning, not test-taking skills or speed • Focused on dimensions that we think the intervention might improve – Requires thinking carefully about what kind of test domains we want to focus on – Also requires thinking about how the assessment might be ‘gamed’

Answer 13

• Continuous well-distributed measure of student achievement – No ceiling or floor effects – Not be “too easy”, “too hard” or “too short” • This Goldilocks zone can often be very hard to achieve!

Answer 14

• Tests should be discriminating i.e. informative at all levels of ability – should be able to distinguish differences in absolute achievement around 10th percentile as well as around median ability – This is often hard to do: • PISA, TIMSS etc. not informative at very low achievement levels • ASER not informative at high achievement levels

Answer 15

dynamic comparability is a test that allows you to measure progress of student learning over time.

Answer 16

cross-sectional comparability is something that allows you to place a student in a wider distribution of contemporaries. So this could be a peer group in the same state, in the same country, an international peer group--and that's cross-sectional comparability.

Answer 17

if there's an absolute standard out there of what is considered a grade-appropriate competence, is how are your kids doing relative to that benchmark?

Answer 18

Achievement can be compared across time and samples

Answer 19

Grade-appropriate tests are particularly inappropriate for many developing country contexts (kids are so far behind in learning) Try to design a test that contain items targeting a wide distribution of achievement

Answer 20

Each time should map into a concrete skill that we want to test, there should be a subset of items repeated across rounds for comparability and a subset of items should be drawn from other assessments

Answer 21

It should not be assumed that item properties are maintained in translation

Answer 22

Individually, group-oral, written

Answer 23

Individual oral much better for assessing children at young ages but very burdensome in the field – Group oral attempts to replicate above at scale but classroom management is not easy, answers less precise – Written tests are ideal for later grades but with a strong possibility of floor effects in primary grades

Answer 24

Raven's matrices

Answer 25

EGRA, ASER

Answer 26

Doing so allows us to compare results across studies that use different tests

Answer 27

it allows you to compare kids in a common distribution so that you're able to do better cross-sectional comparison and better over time comparison, even though the content of the test itself might be changing. Models the probability that an individual with given ability will get an item right the most important advantage of IRT is the ability to link across tests and over time.

Answer 28

Maps the trait (ability or knowledge) to the proportion correct

Answer 29

The probability that an examinee with no ability or knowledge will answer a question correctly (basically guesses and by chance gets it right) Where the curve intersects with Y axis?

Answer 30

How difficult the question is; level of ability an examinee needs to get the question right with probability (1+c/2). If you move the curve to the right, the difficulty will increase. Mid-point of the curve; on x-axis

Answer 31

Measure of how well the question is able to distinguish between examinees of different ability/knowledge. How steep the curve is. if this ICC is much flatter, then that tells you that even kids who don't know much could get it right, and kids who know a lot could get it wrong.

Answer 32

Report treatment effects in standard deviations relative to the absolute progress made in the control group

Answer 33

Using a simpler non-IRT test, we can report total scores, total relative to both the control group (simple difference),and relative to the baseline (pre-post); we can report "improvements" relative to the control group (difference-in-difference), and we can report improvements as percentage with either the baseline in the denominator, the control group total in the in the denominator, the control group improvement in the denominator or even the control group percentage gain in the denominator. Or we can report results as standard deviations (always normalizing the control group to equal zero).

Answer 34

With IRT, we can do any of the things we can do with a non-IRT test AND we can report results as standard deviations relative to the baseline and control group, where the control group has a positive value

Measuring Learning Flashcards

(59 cards)