Introduction to Management Flashcards

1
Q

Data collected in an RCT are typically used to measure

A

We collect data because it is either part of our theory of change, it can help improve power, it can help us better measure CACE, and for generalizability. We collect personally identifiable information and Unique IDs only for operational purposes–to track individuals. We do not use this data, per se, in our analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hours spent helping child with homework would be an indicator for which part of the LogFrame?

A

“Hours spent helping with child with homework” is an indicator for “Parents get more involved in their children’s education at home, “ which in this LogFrame is an outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a log frame, what would be considered a “source of verification”?

A

A source of verification is where the data come from.

Ex: arrest records are the source of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following reasons (consistent with our theory of change and the results) explain why we see significantly more road improvements in West Bengal reservation villages than in Rajasthan reservation villages?

A

A final step in our theory of change was that public investments would better reflect women’s priorities in reservation villages. Indeed, in West Bengal, road construction was a higher priority (relative to men), and we saw an increase in road construction. In Rajasthan, this wasn’t the case. Note, moreover, that we do not know how the priorities of women in West Bengal compare to those of women in Rajasthan in absolute terms. According to our theory of change, what matters is the relative priorities between men and women, not the absolute levels of priority among women. For example, if women and men in Rajasthan both place a very high level of priority on road improvement, then we would not expect to see any changes in response to greater women’s representation. Meanwhile, in West Bengal, women rank road improvement as a moderate priority (lower than their counterparts in Rajasthan), but men rank road improvement as a low priority. Here, we would expect to see positive effects on road improvement from greater women’s representation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following is most likely to be considered primary data in the evaluation of a social program?

  • Tax records to measure income
  • National Oceanic and Atmospheric Administration satellite data to measure rainfall for weather insurance programs.
  • Hospital records to measure health status
  • Census records to measure occupation
  • Online survey to measure approval ratings for a politician
A

National Oceanic and Atmospheric Administration satellite data to measure rainfall for weather insurance programs.

Primary data is collected principally for the purpose of research or evaluation. An online survey is likely part of an evaluation. Tax, hospital, and census records are conducted either for administration, policy, or for others. NOAA data is likely by climate scientists for climate research, not by social scientists for social programs, but it is still collected for the purpose of research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Our primary research asks whether our school-feeding program leads to better learning outcomes. Our secondary question is whether the impact is larger for those who are malnourished.

To answer our secondary question, when is the best time to collect indicators measuring nourishment

A

Baseline

Due to randomization, the proportion of malnourished children should be identical at baseline. As soon as the intervention begins, however, we may see the composition of this group begin to change due to the intervention. To answer our secondary question, we want to look at statistically similar groups (from the beginning).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Kelsey suggests that some accusations claiming that researchers are “experimenting on people,” are unjustified because …

A

The program is not being implemented by the researcher and would happen anyway

Kesley gives the example that if the government is rolling out a program to provide computers in classrooms, it will not necessarily send out forms asking parents for permission. The computers are the “experiment”. Researchers may come in after this decision has already been made to measure outcomes, which itself isn’t the “experiement”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Empowerment is…

Data
An indicator
A response
A construct

A

Empowerment is a concept that has to be distilled into an indicator or question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Blood Pressure = 110/71 mm Hg is:

Data
An indicator
A response
A construct

A

Data

110/71 mm Hg is a specific measure–the number for a specific individual. In other words, it is a piece of data. Since it is an anthropometric measure, it is not part of a survey, and does not have a question or response process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discrimination is:

Data
An indicator
A response
A construct

A

Discrimination is a concept that has to be distilled into an indicator or question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kilograms of rice per hectare is:

Data
An indicator
A response
A construct

A

This is an indicator, probably meant to measure the construct of rice yields. The data or response would be a specific number of kilograms per hectare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outcome: annual consumption, Indicator: food expenditure in last week

This example may have problems with:
Validity
Reliability
Both

A

Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outcome: annual consumption, Indicator: food expenditure in last month

This example may have problems with:
Validity
Reliability
Both

A

Validity and Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Validity is to Reliability as:

Noise is to Bias
Precision is to Noise
Bias is to Accuracy
Accuracy is to Precision
Precision is to Accuracy
A

Validity, like accuracy, is the idea that we’re not systematically missing our target (the truth) in a particular direction. In measurement, our target is our construct. Reliability, like precision, is the idea that each subsequent attempt at measurement (or estimate) is consistently close to prior attempts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the four stages of the response process?

A

Comprehension, Retrieval, Estimation, Answer

Comprehension: whether the respondents understand what is being asked; Retrieval of the necessary information from their brain; Using judgement to synthesize memories into an answer; Reporting their answer based on the reponse options given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Measurement error can be introduced at which stage(s)?

Indicator selection
Respondent’s comprehension of the question
Retrieval of information
Estimation or judgment
Reporting an answer
A

Measurement error can be introduced at all of these stages, whether it’s a problem with an indicator’s construct validity, or confusion at any stage in the response process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The response to the question, “Do you plan to marry your daughters before they are 18 years old?” should be considered:

A fact, because the respondent knows what their plan is today, even if the plan never materializes

A quasi-fact, because plans for marriage is a question about identity that typical categories do not capture

Subjective: because it has to do with an expectation, and at the moment of the responding, is known only to the respondent and cannot be verified

A

Any expectation is known only to the person responding and cannot be directly observed. It is therefore subjective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

A person’s occupation would be considered:

A permanent state of being
A fluctuating state of being
A habitual action or behavior
An episodic action or behavior

A

A fluctuating state of being

A person’s occupation is a state of being in that it is unlikely to change from day to day (it is not a behavior or action), however it can change at any time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which of the following questions is meant to measure an “attitude”?

Do you want your daughter to become a doctor?
Do think your daughter has the ability to become a doctor?
Do you believe your daughter will become a doctor?
Do you think women make good doctors?

A

an attitude is like a belief, but that also implies a normative judgment. Stating whether someone is a good doctors is a normative judgment. The others are perceptions, expectations, or aspirations (respectively).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Exclusive proxy indicator

A

One that is correlated with a specific construct, and not with other competing constructs.

An exclusive proxy indicator is one that measures the construct we care about, and likely cannot be explained by other factors. For example, pregnancy is an indicator of having been sexually active.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is true about the Kling, Leibman, Katz method of creating a standardized index?

A
  • Each individual component is weighted equally correct
  • The unit of measurement for response options will not affect the relative weight of a component (e.g. using kilometers versus miles). correct
  • To increase the relative weight of a particular category within an index (e.g. mobility) one can add components to that category

By standardizing, almost by definition, roughly half of the responses will be negative. The only concern with negatives is that they are consistent with respect to the index. Less of something bad should have the same sign as more of something good.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Field-coded question

A

In a field-coded question, surveyors ask an open-ended question, and then record the response using specific response options, similar to a close-ended question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

open ended questions - pros and cons

A

The researchers may not anticipate all of the possible response options & It might take too long for the surveyor to list all of the response options if presented as a close-ended question

However, to convert open ended questions to usable data, one must code each individual’s response into possible response options, which can be subjective and can take a lot of time. Because ex-post coding often relies on the judgment of the coder, it’s possible this increases the potential for error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why might we want to use a close-ended categorical response option (i.e. ask people to select the option that reflects the appropriate range in which their response would fall) rather than an open-ended numerical response option (to get a precise number)?

A

If the respondent does not know a precise answer, selecting a category may be more accurate than responding with a precise number

Range-options for certain demographic characteristics (e.g. age range) can provide a bit more anonymity than precise numbers (e.g. birth date)

Categorical responses can be more difficult to analyze linearly because linear relationships are usually based on single numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the difference between a Likert scale and a numerical rating scale?

A

Likert and numerical rating scales are nearly identical. What makes a numerical rating scale unique is that each response option (and sometimes points between un-labeled options) corresponds to a number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In the past month, how many times have you skipped a meal?

A. 0 times

B. 1-5 times

C. 6-10 times

D. More than 10 times

What problem has been introduced in this survey question?

A

Vagueness: What is the definition of skipping a meal? Anything less than 3 meals per day? What is the definition of a meal?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A surveyor asks whether the household has made any large purchases in the past 30 days. The respondent happened to purchase a bicycle 40 days ago, so the respondent replies “a bicycle”.

What is the bias that has been introduced in this example?

A

Telescoping bias occurs when a respondent includes a behavior, action or event outside of the reference period and is particuarly common with “lumpy purchases”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

In Country A, there was a study of a government agricultural extension program, where farmers are trained by government agronomists on the benefits of using fertilizer. A number of farmers in the treatment group report in the endline that they used more fertilizer than they actually did because after receiving the extension program, they recognized that using more fertilizer was “the correct answer”

What problem was introduced in this scenario?

A

Social desirability bias occurs when respondents give an answer that they believe is “socially acceptable or desirable”. In this case, it is not a measurement effect, as measurement effect affects behvior itself and not only the response to the question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

We are studying the randomized rollout of a government program to provide electricity to villages, and its impact on learning outcomes. In the endline, we use mobile devices to collect data on literacy levels. However, when the endline is complete, we analyze the data and discover that in the control group, a large proportion of that data is missing. We call in our survey team and learn that the mobile devices would often run out of battery, and in some villages there was no place to recharge the device. This may have led to lost data.

Which of the following methods is least likely to introduce bias?

  • Use the data as is since it was collected using the same method in both groups
  • Return to both treatment and control villages and re-conduct the endline with paper surveys
  • Return to the control villages to conduct the endline using paper surveys
  • Return to the control villages with back up mobile chargers and conduct the endline using the same mobile devices
A

Return to both treatment and control villages and re-conduct the endline with paper surveys

Using the original electronic data would likely introduce attrition bias, since some villages in the control group are more likely to have misisng data since they are less likely to have electricity. Conducting the endline in the control group with a different method or at a different time might introduce systematic error that biases our results. Only surveying both groups at the same time with the same method would ensure we have no error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Intermediate outcomes

A
Changes necessary to achieve the final outcomes.
Usually changes in:
• Knowledge & beliefs
• Attitudes & aspirations
• Capacity & ability
• Decisions, behaviors & actions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Purpose of measurement

A

To measure outcomes (long-term, intermediate, first order, second order, inputs, outputs, etc.); covariates (provide background on respondents, classify respondents behaviors, reduces standard error); treatment compliance (individual & group level; predictors of compliance); heterogenous treatment effects; context for external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are the four rows common in a Log Frame?

A

Impact (goal/overall objective)
Outcome (project objective)
Outputs
Inputs (activities)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the four columns common in a Log Frame?

A

Objectives/Hierarchy
Indicators
Sources of Verification
Assumptions/Threats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

First-order questions in measurement

A
  • What data do you collect?
  • Where do you get it?
  • When do you get it?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Where can we get data?

A

• Obtained from other sources
– Publically available
– Administrative data
– Other secondary data

Collected by researchers
– Primary data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Types and Sources of Data

A

Information provided by a respondent
○ Could be through a survey, exam results, etc.
○ Information about a person, household, possessions

Automatically generated
○ Automatic tollbooths – detailed individual data
○ Or, a sensor picking up data all the time (not about a single person)

Information NOT about a person/household/possessions
Pollution monitors, etc.
Still an active data collection process most likely, but not based on a person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Ways used to collect data on people

A
  • Surveys
  • Exams, tests, etc.
  • Games
  • Vignettes
  • Direct Observation
  • Diaries/Logs
  • Focus groups
  • Interviews
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Main types of surveys

A

• Interviewer administered
– Paper-based
– Computer-assisted/ Digital
– Telephone-based

• Self-administered
– Paper
– Computer/Digital

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

When to collect data during the evaluation process

A
• Baseline
• During the intervention
– Process, Monitoring of intervention
• Endline
• Follow-up
• Scale-up
• Intervention: M&E
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Concept of measurement (from construct to data)

A

Construct –> Indicators –> Data Collection (“Response”) –> Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Goals of measurement

A

Accuracy
Unbiasedness
Validity

Precision
Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Validity (in theory)

A

How well does the indicator map to the outcome?
(e.g. IQ tests -> intelligence)

Construct –> (Validity) –> Indicators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Reliability (in theory)

A

The measure is consistent and precise vs. “noisy”

Construct –> (reliability) –> Indicators –> (reliability) –> Data Collection (“Response”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

4 Steps of the Response Process

A
  1. Comprehension of the question
  2. Retrieval of information
  3. Judgement and estimation
  4. Reporting an answer
45
Q

Response Process - Comprehension

A

How well the respondent understands the question

i.e. How many times did you consume rice this month?
Does this mean just pure rice, or any product that contained rice? Rice flour/rice crackers?

46
Q

Response Process - Retrieval

A

When the respondent thinks about the question, and retrieves the information required to answer

Question - When you received your first measles vaccination, on a scale of 1-5, with 1 being painless, and 5 being unbearable painful: what was the level of pain?

Probably received this vaccination as a child; too long ago to retrieve accurate information; data that you do collect is likely very inaccurate

47
Q

Response Process - Estimation/Judgement

A

When the respondent has to estimate/judge the answer (this should be minimized)

For example, I did “X” thing twice last week, so over the past month…that’s 2x’s a week over four weeks…so probably about 8 times.

48
Q

Response Process - Response

A

Respondent actually gives a verbalized response at this point.

Even after the respondent has gone through comprehension/ retrieval/judgement - there may be some breakdown between the enumerator and respondent.

For example - if you ask about illegal drug use, the respondent might have an accurate answer but give an inaccurate one for fear of implications.

49
Q

Objective vs. subjective facts

A

Objective - facts

Subjective - an opinion, attitude, perception, aspiration, expectations (not verifiable by external observation or records)

Quasi-facts - race, religion, ethnicity, gender
Here, the response could be motivated by objective factors (biology, etc.) or by subjective factors (personal identity)

50
Q

Two ‘branches’ of facts

A

state of being

actions and behaviors

51
Q

Permanent state of being

A

Permanent facts

i.e. date, district of birth

52
Q

Fluctuating state of being

A

Can change at any time (i.e. age; district of residence)

…but sometimes in a predictable way ( age, years of education, years of experience); could change due to outside circumstances (HH size, marital status, number of children); others could be fluctuation but at some point become static (highest education level; number of children a woman has had)

53
Q

Habitual behaviors

A

i.e. regularly attending school

Useful when asking about frequency

54
Q

Episodic behaviors

A

One time and/or infrequent behaviors

i.e. the purchase of a TV

55
Q

Key things to consider when trying to measure ‘facts’

A

Clearly define variables
– What is a household?
– Who can be considered household members?
– What is considered a room in a household? etc.
• Determine the level of precision you need for your study
• Useful to look at standard questionnaires for framing these questions
• Be aware of local context and culture

56
Q

Subjective questions - what to measure?

A

Beliefs (Cognitive)
Expectations (behavior intentions)
Attitudes (Evaluative)

57
Q

Subjective questions - Beliefs

A

Beliefs (Cognitive) - set of beliefs about the attitudinal object (i.e. how healthy are cigarettes on the following dimensions?)

58
Q

Subjective questions - Expectations

A

Expectations (behavior intentions) - Respondents’ attitudes in relation to their future actions. (i.e. Do you plan to quit smoking in the next year?)

59
Q

Subjective questions - Attitudes

A

Attitudes (Evaluative) - Evaluation of the object. (i.e. Do you associate smoking with being cool?)

60
Q

How can we think about subjective questions in terms of the outcomes we are looking to measure?

A

Subjective questions are usually never the ultimate impact outcome we intend to measure. However, they are very useful as an intermediate outcome. If individuals are unaware of the dangers of cigarette smoke, we probably need to change their beliefs before expecting them to stop smoking.

Or subjective measures can help us understand the context and possibly assumptions under which our theory of change will work. For example - an information campaign on the dangers of smoking may not be as effective if people are already aware, but rather think it’s worth the cost to look cool.

61
Q

Which constructs are particularly hard to measure? What is a common solution?

A

Sensitive questions will usually cause the respondent to not be honest about their answer, even if they know it.

Sometimes the respondent does not know the answer- Unknown

Use proxy indicators! Rather than directly asking them the sensitive question - can ask about correlated measures. Must be correlated with construction, and ensure the correlation is dynamic (in terms of the outcome we are measuring, etc.)

62
Q

Exclusive indicator

A

Only one proxy indicator is needed to measure the construct

For example, as rice yields increase, a larger proportion could be used to replant for the following season, or they may now actually be able to sell some of that rice in the market or traded for other goods. Therefore, unless we are certain that all rice grown is consumed, then rice yielded is not an exclusive indicator of nutrition.

63
Q

Exhaustive indicator

A

We sometimes want an exhaustive list of indicators for our construct. And we’re not always confident that the indicator always moves in the same direction as our construct. We may not want to rely only on one proxy alone.

So, for example, if we want to know total calorie consumption, we may need a full accounting of all the food our respondents consume. For each indicator or food item, we ask about the quantity consumed in terms of weight or serving. And then we can ask–then we can convert it into calories and add up all the calories at the end to get total caloric intake.

64
Q

Index to measure construct

A

The middle ground between a single indicator and a comprehensive accounting of all things consumed would be an index.

For example - where we create a sample of food items–a consumption basket– and weigh each item’s contribution to the value of index in proportion to the amount a typical household consumes that item.

65
Q

Component

A

An item in an index

66
Q

Equal weighting in indices

A

Sum of all components, no weighting

If we believe our index components are comprehensive, or at least representative

67
Q

Thematic clustering in indices

A

Weight components by themes

68
Q

How to think about positive/negative signs when creating indices?

A

Need to take into account components that might be negatively correlated

We don’t want to naively add positives and negatives both signs of empowerment together and have them cancel each other out in the end. We just need to make sure we adjust the signs.

69
Q

Opinion-based weighting indices

A

Sometimes, expert judgment is used. For many exams, the teacher may weight a section by how much time was spent on the topic during the semester, or by how important he or she feels it is to getting to the next level of difficulty.

70
Q

Principle components Analysis
Unobserved Components Model
Seemingly Unrelated Regressions

A

there are statistical methods that weight index components by their actual or potential explanatory power.

What these methods all have in common is that they remove any correlation between components so that their latent attributes are not double or triple counted when contributing to the constructor we care about.

71
Q

Standardized weighting (Kling, Leibman, Katz)

A

a method that standardizes individual components within the index before compiling them. This is also called a z-score index.

So for example, if one component was ranked on a scale between 1 and 100 and the other between 1 and 10, the former’s contribution would not be 10 times that of the ladder. On the other hand, the more components that are included within a particular theme, the more weight that theme has in the overall index.

72
Q

Standardized weighting (Kling, Leibman, Katz) - 4 steps

A

• Determine the comparison group against which you will standardize
– Baseline
– Control group of the same round

We can use the entire sample from the baseline, or we can just use the control group observations from the end line. Obviously, if you don’t have a baseline measure, you’ll need to use the control group.

• Standardize individual components of the index
– Standardized variable = (variable – variable mean)/ variable standard deviation

we demean each observation. In other words, we take the average value of that component from the comparison group, either the baseline sample or the end line sample of the control group, and subtract it from each observation. Then we divide the demeaned value by the standard deviation of the comparison group.

• Average components together

• Standardize the final index
– Standardized index = (index – index mean)/ index standard deviation

We take the mean value of the entire index–and that’s the entire index not just the comparison sample–and divide it by the standard deviation of the index.

73
Q

Common mistakes when creating indices

A
  • Forgetting to recode variables that measure bad
  • Forgetting to recode missing values
  • Forgetting to account for missing values
74
Q

Alternative index methods

A

• Principal component analysis (PCA)
– Reduces multidimensionality of data
• Seemingly Unrelated Regression Estimation (SURE)

With a principal component analysis or other similar methods, we ensure components are combined into themes to maximize variation. Or in other words, we adjust for inter-component correlation within each theme. And then we adjust for the inter-themed correlation at the end.

The SURE method (or seemingly unrelated regression estimation) does basically the same thing without creating themes first.

75
Q

Types of questions & response types

A

Open-ended questions; Responses: Verbatim, Numeric

Field-coding questions

Close-ended questions: Single response, Multiple responses, response filters, ratings, rankings

76
Q

Open-ended questions

A

Respondents are allowed to talk through responses, rather than respond to pre-selected codes

77
Q

Issues with coding open-ended questions

A
– Coding free response material is:
• Time consuming
• Costly
• Induces coding error
Requires good interviewer skills in recognizing ambiguity of responses and probing (if required)
78
Q

Open-ended verbatim questions

A

• Verbatim responses
– E.g.: What are the most significant health concerns faced by you and your family?
– Best to use when you don’t know too much about the likely responses
– Requires good interviewer skills in recognizing ambiguity of responses and probing (if required)

79
Q

Open-ended numeric questions

A

Numeric responses
– E.g.: What is your age?
– Often associated with demographic variables
– E.g. How many times did you visit the hospital in the past 30 days?
– Units must be explicit
– E.g. How many times did you consume in the past 7 days?

The fact that we’re asking for a numeric response is restricting enough that it’s easy to convert into data.
However, we may run into problems if there are implied units, but those units are not made explicit.

80
Q

Close ended questions

A

Respondents are presented with a set of pre-coded responses to choose from
• Response categories usually generated through cognitive interviews, focus group discussions and pretesting
• Respondents are given both the topic and the dimensions on which answers are wanted

Often field-tested first, to ensure there isn’t a lot of misinterpretation

81
Q

• Single choice responses

A

yes/no or true/false question

Did you work as a hired laborer in the autumn season?
A. Yes
B. No

82
Q

• Multiple choice responses

A

Another example is a typical standardized exam where you fill in the bubble A, B, C, D, or E. Sometimes these are called multiple choice questions. However, they should be not confused with multiple choice responses. Because of the potential for confusion, we usually refer to multiple choice responses as choose all that apply or select all that apply questions

In which seasons did you work as a hired laborer during this year? (SELECT ALL THAT APPLY)
A. Autumn
B. Spring
C. Summer
D. None
83
Q

Response scales - what 3 types?

A

– Likert
– Numeric
– Frequency

84
Q

benefits to using ranges for response options

A

First, it could mitigate privacy concerns the respondent may have. Or if it’s a question that requires estimation or recall–for example, how much rice did you consume last month–it could help the respondent if they’re unsure of the precise answer.

Forcing them to give a numeric answer rather than choosing a range may lead them to round up or round down, which could bias the results.

Sometimes we want to know the reasoning behind people’s opinions, actions, behaviors, or decisions. Even if we see a pattern amongst a majority of the population, that pattern may not hold for each and every respondent. It would be presumptuous to assume it did.

85
Q

Likert Scale

A

Do you strongly agree, agree, neither agree nor disagree, disagree, or strongly disagree?

Central tendency bias

86
Q

Likert - Bipolar scale

A

Sometimes it’s set up so that the first option is very positive like strongly agree. And the second option is very negative like strongly disagree. If it were numeric, it might range from say plus 2 to minus 2. This would be called a bipolar scale.

87
Q

Likert - Unipolar scale

A

Other times, it’s set up so that the last option is equivalent to a zero. So do not agree rather than strongly disagree. So the numeric equivalent might be zero to four.

88
Q

Central tendency bias

A

• One thing to note here is the inclusion of a middle alternative. In this case, the neutral option. A middle alternative offers an indifference point between being for or against a particular view.

Providing a middle alternative has pros and cons. It may be the best option for those who are truly indifferent. However, sometimes respondents will subconsciously default to a middle option because cognitively it is least taxing.

This is known as central tendency bias. Central tendency bias may lead a disproportionate number of responses taking the middle option, meaning it won’t reflect the true underlying distribution of our population.

89
Q

Numeric scales

A

If we want a larger spread than a typical five point Likert provides us, or if we want to quantify the responses as a continuous variable, we can use numeric scales.

Numeric scales help us extract a bit more granularity out of our sample if we believe there is true underlying variance in opinions.

So for example, on a scale from 0 to 10, how much do you agree with the following statement?

90
Q

Gumann scale

A

Only two options given - agree or disagree

91
Q

Frequency scale

A

Similar to a Likert scale

For example, how often do you visit your child’s school? One set of responses could be similar to a Likert–never, rarely, sometimes, often, always. The responses represent some implied quantity. But the magnitude of those quantities is subjective.

Alternatively, we can use frequency scales that are closer to reflecting true numbers. For example, daily, weekly, yearly.

Again, this doesn’t give us a precise answer and makes analysis slightly more difficult if we hope to estimate a linear relationship between this variable and some other. But it may make it easier for our respondent to translate a vague estimate into a response.

92
Q

Rankings

A

This is particularly useful when respondents do not have an absolute sense of quantity, but they do have a relative sense. They can sort the response options in some way or the other.

Which teaching-learning materials do you use most often? (Rank the top 3)
Options: Workbooks, flipcharts, textbooks, maps, flash cards, games etc.

93
Q

Field coding

A

a version of an open response question when the response options are not given to the respondent. So the respondent answers in his or her own words. Then the surveyor codes these responses using pre-determined response categories.

This is useful when we don’t want to prompt the respondent with possible options, because perhaps out of convenience, they may just select one, several, or all of the responses given to them without even thinking.

It is especially true if the response options might signal to the respondent which answers the researcher considers quote-unquote acceptable. However, it requires a bit more faith in our surveyors, or at least a bit more skill.

94
Q

Measurement error: Vagueness

A

Vague concepts where respondents may interpret the question in a different way

Make sure to define vague concepts

95
Q

Measurement Error: Completeness

A

The response categories do not include all categories that can be expected as a response

Pilot question to make sure that categories are exhaustive

96
Q

Measurement Error: Negatives

A

Questions that include negatives can be confusing to the respondent and lead to misinterpretations.

Avoid unnecessary negatives

97
Q

Measurement Error: Overlapping Categories

A

The categories overlap each other.

Make sure that all categories are mutually exclusive

98
Q

Measurement Error: Presumptions

A

The question assumes certain things about the respondent

Use filters and skip patterns

99
Q

Measurement Error: Framing effect

A

People react to a particular choice in different ways depending on how it is presented i.e. prefer gains over losses

Try to be neutral when framing questions

100
Q

Measurement Error: Recall Bias

A

People may retrieve recollections regarding events or experiences differently

You can ask respondents to keep a diary or save their receipts

101
Q

Measurement Error: Anchoring Bias

A

People tend to rely too heavily on the first piece of information seen

Avoid adding anchors to your questions

102
Q

Measurement Error: Telescoping Bias

A

People perceive recent events as being more remote than they are (backward telescoping) and distant events as being more recent than they are (forward telescoping)

Visit once at the beginning of the reference period. Then ask, “since the last time I visited you, have you…?”

103
Q

Measurement Error: Social Desirability Bias

A

Tendency of respondents to answer questions in a manner that is favorable to others i.e. emphasize strengths, hide flaws, or avoid stigma

Ask indirectly, ensure privacy

104
Q

Measurement (Survey) effects

A
  • Act of being surveyed changes subsequent behavior
  • Particularly relevant for panel surveys where there are multiple interactions with respondents

More data are better for analysis, But measurement effects could change the interpretation of subject behavior. Consider which questions could be asked at endline only, or obtained through non-survey methods

105
Q

Bias is uncorrelated with treatment

A

Imagine we have a measurement instrument that systematically overestimates the value of our outcome, but it does so the same amount in both the treatment and control groups

If we look at either the difference in our outcome between baseline and endline, or between the treatment and control groups, we’ll estimate the true difference and the true impact without bias.

This is an example of where the magnitude of bias is totally uncorrelated with the treatment assignment.

106
Q

Bias is correlated with treatment

A

One group (treatment or control) is over-reporting positive or negative results, leading to an inaccurate assessment of whether or not there was an impact

(i.e. exaggerating positive behavior in treatment group would make it seem like there was a large impact, when, in fact it’s wrong & biased)

107
Q

To reduce the chance of bias, for all treatment groups we should collect data with:

A

For all treatment groups we should collect data with:
– Same enumerators - Blinding to the treatment assignment
– Same time period
– Same methods
– Same incentives

We want to make sure that the differences that we measure between the two groups is due to the treatment and the treatment alone, not surveyor characteristics or biases.

108
Q

A biased measure will bias our impact estimates. True/False

A

It depends. If the bias differs systematically between the treatment and control groups, it will introduce error. If the bias is identical for both groups, then it may not introduce error.

109
Q

In education policy circles, there is a contentious debate about the role of standardized exams. Opponents argue that standardized exams incentivize teachers to “teach to the test”. In other words, opponents take issue with the _____ of exams as a measure of learning levels

A

Validity.

Oppenents take issue with whether standardized exams are a valid measure of learning levels since one can do well on a test without having mastered concepts, or vice versa.