Introduction to Management Flashcards
Data collected in an RCT are typically used to measure
We collect data because it is either part of our theory of change, it can help improve power, it can help us better measure CACE, and for generalizability. We collect personally identifiable information and Unique IDs only for operational purposes–to track individuals. We do not use this data, per se, in our analysis.
Hours spent helping child with homework would be an indicator for which part of the LogFrame?
“Hours spent helping with child with homework” is an indicator for “Parents get more involved in their children’s education at home, “ which in this LogFrame is an outcome.
In a log frame, what would be considered a “source of verification”?
A source of verification is where the data come from.
Ex: arrest records are the source of data.
Which of the following reasons (consistent with our theory of change and the results) explain why we see significantly more road improvements in West Bengal reservation villages than in Rajasthan reservation villages?
A final step in our theory of change was that public investments would better reflect women’s priorities in reservation villages. Indeed, in West Bengal, road construction was a higher priority (relative to men), and we saw an increase in road construction. In Rajasthan, this wasn’t the case. Note, moreover, that we do not know how the priorities of women in West Bengal compare to those of women in Rajasthan in absolute terms. According to our theory of change, what matters is the relative priorities between men and women, not the absolute levels of priority among women. For example, if women and men in Rajasthan both place a very high level of priority on road improvement, then we would not expect to see any changes in response to greater women’s representation. Meanwhile, in West Bengal, women rank road improvement as a moderate priority (lower than their counterparts in Rajasthan), but men rank road improvement as a low priority. Here, we would expect to see positive effects on road improvement from greater women’s representation.
Which of the following is most likely to be considered primary data in the evaluation of a social program?
- Tax records to measure income
- National Oceanic and Atmospheric Administration satellite data to measure rainfall for weather insurance programs.
- Hospital records to measure health status
- Census records to measure occupation
- Online survey to measure approval ratings for a politician
National Oceanic and Atmospheric Administration satellite data to measure rainfall for weather insurance programs.
Primary data is collected principally for the purpose of research or evaluation. An online survey is likely part of an evaluation. Tax, hospital, and census records are conducted either for administration, policy, or for others. NOAA data is likely by climate scientists for climate research, not by social scientists for social programs, but it is still collected for the purpose of research.
Our primary research asks whether our school-feeding program leads to better learning outcomes. Our secondary question is whether the impact is larger for those who are malnourished.
To answer our secondary question, when is the best time to collect indicators measuring nourishment
Baseline
Due to randomization, the proportion of malnourished children should be identical at baseline. As soon as the intervention begins, however, we may see the composition of this group begin to change due to the intervention. To answer our secondary question, we want to look at statistically similar groups (from the beginning).
Kelsey suggests that some accusations claiming that researchers are “experimenting on people,” are unjustified because …
The program is not being implemented by the researcher and would happen anyway
Kesley gives the example that if the government is rolling out a program to provide computers in classrooms, it will not necessarily send out forms asking parents for permission. The computers are the “experiment”. Researchers may come in after this decision has already been made to measure outcomes, which itself isn’t the “experiement”.
Empowerment is…
Data
An indicator
A response
A construct
Empowerment is a concept that has to be distilled into an indicator or question
Blood Pressure = 110/71 mm Hg is:
Data
An indicator
A response
A construct
Data
110/71 mm Hg is a specific measure–the number for a specific individual. In other words, it is a piece of data. Since it is an anthropometric measure, it is not part of a survey, and does not have a question or response process.
Discrimination is:
Data
An indicator
A response
A construct
Discrimination is a concept that has to be distilled into an indicator or question
Kilograms of rice per hectare is:
Data
An indicator
A response
A construct
This is an indicator, probably meant to measure the construct of rice yields. The data or response would be a specific number of kilograms per hectare
Outcome: annual consumption, Indicator: food expenditure in last week
This example may have problems with:
Validity
Reliability
Both
Validity
Outcome: annual consumption, Indicator: food expenditure in last month
This example may have problems with:
Validity
Reliability
Both
Validity and Reliability
Validity is to Reliability as:
Noise is to Bias Precision is to Noise Bias is to Accuracy Accuracy is to Precision Precision is to Accuracy
Validity, like accuracy, is the idea that we’re not systematically missing our target (the truth) in a particular direction. In measurement, our target is our construct. Reliability, like precision, is the idea that each subsequent attempt at measurement (or estimate) is consistently close to prior attempts.
What are the four stages of the response process?
Comprehension, Retrieval, Estimation, Answer
Comprehension: whether the respondents understand what is being asked; Retrieval of the necessary information from their brain; Using judgement to synthesize memories into an answer; Reporting their answer based on the reponse options given
Measurement error can be introduced at which stage(s)?
Indicator selection Respondent’s comprehension of the question Retrieval of information Estimation or judgment Reporting an answer
Measurement error can be introduced at all of these stages, whether it’s a problem with an indicator’s construct validity, or confusion at any stage in the response process
The response to the question, “Do you plan to marry your daughters before they are 18 years old?” should be considered:
A fact, because the respondent knows what their plan is today, even if the plan never materializes
A quasi-fact, because plans for marriage is a question about identity that typical categories do not capture
Subjective: because it has to do with an expectation, and at the moment of the responding, is known only to the respondent and cannot be verified
Any expectation is known only to the person responding and cannot be directly observed. It is therefore subjective.
A person’s occupation would be considered:
A permanent state of being
A fluctuating state of being
A habitual action or behavior
An episodic action or behavior
A fluctuating state of being
A person’s occupation is a state of being in that it is unlikely to change from day to day (it is not a behavior or action), however it can change at any time.
Which of the following questions is meant to measure an “attitude”?
Do you want your daughter to become a doctor?
Do think your daughter has the ability to become a doctor?
Do you believe your daughter will become a doctor?
Do you think women make good doctors?
an attitude is like a belief, but that also implies a normative judgment. Stating whether someone is a good doctors is a normative judgment. The others are perceptions, expectations, or aspirations (respectively).
Exclusive proxy indicator
One that is correlated with a specific construct, and not with other competing constructs.
An exclusive proxy indicator is one that measures the construct we care about, and likely cannot be explained by other factors. For example, pregnancy is an indicator of having been sexually active.
What is true about the Kling, Leibman, Katz method of creating a standardized index?
- Each individual component is weighted equally correct
- The unit of measurement for response options will not affect the relative weight of a component (e.g. using kilometers versus miles). correct
- To increase the relative weight of a particular category within an index (e.g. mobility) one can add components to that category
By standardizing, almost by definition, roughly half of the responses will be negative. The only concern with negatives is that they are consistent with respect to the index. Less of something bad should have the same sign as more of something good.
Field-coded question
In a field-coded question, surveyors ask an open-ended question, and then record the response using specific response options, similar to a close-ended question
open ended questions - pros and cons
The researchers may not anticipate all of the possible response options & It might take too long for the surveyor to list all of the response options if presented as a close-ended question
However, to convert open ended questions to usable data, one must code each individual’s response into possible response options, which can be subjective and can take a lot of time. Because ex-post coding often relies on the judgment of the coder, it’s possible this increases the potential for error.
Why might we want to use a close-ended categorical response option (i.e. ask people to select the option that reflects the appropriate range in which their response would fall) rather than an open-ended numerical response option (to get a precise number)?
If the respondent does not know a precise answer, selecting a category may be more accurate than responding with a precise number
Range-options for certain demographic characteristics (e.g. age range) can provide a bit more anonymity than precise numbers (e.g. birth date)
Categorical responses can be more difficult to analyze linearly because linear relationships are usually based on single numbers.
What is the difference between a Likert scale and a numerical rating scale?
Likert and numerical rating scales are nearly identical. What makes a numerical rating scale unique is that each response option (and sometimes points between un-labeled options) corresponds to a number.
In the past month, how many times have you skipped a meal?
A. 0 times
B. 1-5 times
C. 6-10 times
D. More than 10 times
What problem has been introduced in this survey question?
Vagueness: What is the definition of skipping a meal? Anything less than 3 meals per day? What is the definition of a meal?
A surveyor asks whether the household has made any large purchases in the past 30 days. The respondent happened to purchase a bicycle 40 days ago, so the respondent replies “a bicycle”.
What is the bias that has been introduced in this example?
Telescoping bias occurs when a respondent includes a behavior, action or event outside of the reference period and is particuarly common with “lumpy purchases”
In Country A, there was a study of a government agricultural extension program, where farmers are trained by government agronomists on the benefits of using fertilizer. A number of farmers in the treatment group report in the endline that they used more fertilizer than they actually did because after receiving the extension program, they recognized that using more fertilizer was “the correct answer”
What problem was introduced in this scenario?
Social desirability bias occurs when respondents give an answer that they believe is “socially acceptable or desirable”. In this case, it is not a measurement effect, as measurement effect affects behvior itself and not only the response to the question
We are studying the randomized rollout of a government program to provide electricity to villages, and its impact on learning outcomes. In the endline, we use mobile devices to collect data on literacy levels. However, when the endline is complete, we analyze the data and discover that in the control group, a large proportion of that data is missing. We call in our survey team and learn that the mobile devices would often run out of battery, and in some villages there was no place to recharge the device. This may have led to lost data.
Which of the following methods is least likely to introduce bias?
- Use the data as is since it was collected using the same method in both groups
- Return to both treatment and control villages and re-conduct the endline with paper surveys
- Return to the control villages to conduct the endline using paper surveys
- Return to the control villages with back up mobile chargers and conduct the endline using the same mobile devices
Return to both treatment and control villages and re-conduct the endline with paper surveys
Using the original electronic data would likely introduce attrition bias, since some villages in the control group are more likely to have misisng data since they are less likely to have electricity. Conducting the endline in the control group with a different method or at a different time might introduce systematic error that biases our results. Only surveying both groups at the same time with the same method would ensure we have no error.
Intermediate outcomes
Changes necessary to achieve the final outcomes. Usually changes in: • Knowledge & beliefs • Attitudes & aspirations • Capacity & ability • Decisions, behaviors & actions
Purpose of measurement
To measure outcomes (long-term, intermediate, first order, second order, inputs, outputs, etc.); covariates (provide background on respondents, classify respondents behaviors, reduces standard error); treatment compliance (individual & group level; predictors of compliance); heterogenous treatment effects; context for external validity
What are the four rows common in a Log Frame?
Impact (goal/overall objective)
Outcome (project objective)
Outputs
Inputs (activities)
What are the four columns common in a Log Frame?
Objectives/Hierarchy
Indicators
Sources of Verification
Assumptions/Threats
First-order questions in measurement
- What data do you collect?
- Where do you get it?
- When do you get it?
Where can we get data?
• Obtained from other sources
– Publically available
– Administrative data
– Other secondary data
Collected by researchers
– Primary data
Types and Sources of Data
Information provided by a respondent
○ Could be through a survey, exam results, etc.
○ Information about a person, household, possessions
Automatically generated
○ Automatic tollbooths – detailed individual data
○ Or, a sensor picking up data all the time (not about a single person)
Information NOT about a person/household/possessions
Pollution monitors, etc.
Still an active data collection process most likely, but not based on a person
Ways used to collect data on people
- Surveys
- Exams, tests, etc.
- Games
- Vignettes
- Direct Observation
- Diaries/Logs
- Focus groups
- Interviews
Main types of surveys
• Interviewer administered
– Paper-based
– Computer-assisted/ Digital
– Telephone-based
• Self-administered
– Paper
– Computer/Digital
When to collect data during the evaluation process
• Baseline • During the intervention – Process, Monitoring of intervention • Endline • Follow-up • Scale-up • Intervention: M&E
Concept of measurement (from construct to data)
Construct –> Indicators –> Data Collection (“Response”) –> Data
Goals of measurement
Accuracy
Unbiasedness
Validity
Precision
Reliability
Validity (in theory)
How well does the indicator map to the outcome?
(e.g. IQ tests -> intelligence)
Construct –> (Validity) –> Indicators
Reliability (in theory)
The measure is consistent and precise vs. “noisy”
Construct –> (reliability) –> Indicators –> (reliability) –> Data Collection (“Response”)