Working with Administrative Data Flashcards by Sarah Martin

Which of these is an example of administrative data

Text from a tweet
Information from a birth certificate
Demographic information collected during a baseline survey
Income information from tax records
None of the above

Both the information from a birth certificate and income information from tax records are data that are collected during the normal operations of a program and not primarily collected for specific research.

How well did you know this?

Not at all

Perfectly

Compared to survey data, administrative data are less susceptible to _______ bias because _______

recall bias,

because data are collected at the time of occurrence

How well did you know this?

Not at all

Perfectly

T/F: In the context of randomized evaluations, researchers can obtain administrative data from both public (i.e., governmental) and private institutions.

True

Both public and private institutions have provided individual-level administrative data to researchers for the purposes of randomized evaluations.

How well did you know this?

Not at all

Perfectly

In regards to administrative data, the _______ identified and sensitive the data that you are asking to be released, the _______ challenging it will be to get those data outside of the agency for research.

more, more

How well did you know this?

Not at all

Perfectly

When choosing identifiers for matching study data to administrative data, which of the following identifiers would be preferable to using an individual’s street address

A government-issued, unique identification number
Date of birth

because these are both numerical identifiers as opposed to identifiers comprised of letters and numbers

How well did you know this?

Not at all

Perfectly

The exact/deterministic matching strategy may lead to more ___________, while the fuzzy/probabilistic matching strategy may lead to more ___________.

False negatives, false positives

fuzzy and probabilistic matching strategies can account for minor discrepancies, but may lead to more false positives. On the other hand, exact and deterministic matching strategies do not account for minor discrepancies and might lead to more false negatives

How well did you know this?

Not at all

Perfectly

During the data matching process, the _______ file and the _______ file are combined to create the _______ file

identified finder, administrative data, de-identified analysis

How well did you know this?

Not at all

Perfectly

identified finder file

contains the identifiers of the study sample and a study ID. The study ID is a numeric ID that uniquely identifies each person in the study.

How well did you know this?

Not at all

Perfectly

administrative data file

The data provider has the administrative data file that contains identifiers and the outcome variables of interest to the research team.

How well did you know this?

Not at all

Perfectly

de-identified analysis file

The data flow process will determine how the identified finder file and the administrative data file are combined to create the de-identified analysis file

How well did you know this?

Not at all

Perfectly

In addition to the data provider, who should sign the Data Use Agreement (DUA)?

An official institutional representative

rather than an individual PI or staff member.

How well did you know this?

Not at all

Perfectly

If the research team never comes into contact with individuals in the study, they do not need to get IRB approval to use the individuals’ administrative data.

T/F

False

Even though the research team may never come into contact with the individuals whose information is included in the administrative data, it may still be necessary to complete the IRB process, even if just to confirm “exempt” status.

How well did you know this?

Not at all

Perfectly

Reporting bias

occurs when people have incentive to under- or over-report information.

How well did you know this?

Not at all

Perfectly

Why are administrative data useful?

The outcomes and metrics required for a study may already be tracked by a government or organization
• Available retrospectively
• Enable long-term follow-up
• Reduce logistical burden
• Include near census of relevant population
• Often cheaper than surveys

How well did you know this?

Not at all

Perfectly

How do administrative data minimize recall bias?

Data recorded at the time of occurrence– no memory

needed (e.g., banking records)

How well did you know this?

Not at all

Perfectly

How do administrative data minimize social desirability bias?

Non-self-reported data (e.g., arrest records)

How do administrative data minimize Differential attrition and non-response bias

Near census of relevant population

Identifiable vs. partially de-identified, de-identified

Identifiable - very easy to identify individuals

Partially de-identified - more difficult; but still possible especially with additional knowledge to piece together

De-identified - very difficult or impossible to identify

Exact/Deterministic Matching

Minor discrepancies are not well accounted for
– E.g., typos in name, reversed day and month in DOB
• Some records are not identified as matches even though they may be (false negatives)

Fuzzy/Probabilistic Matching

Accounts for the likelihood that identifiers may not align exactly to those in a data system
– E.g., SSN and last name match, DOB is off by a month…counts as a match

Differential Coverage bias

Differential ability to link individuals to administrative records
• Treatment and control are differentially likely to appear in administrative records

To address differential coverage bias

• Collect identifiers for linking during the baseline
survey
– To ensure that you are equally likely to be able to link
treatment and control individuals to their records
• Identify the data universe
– Which individuals are included in the data and which are excluded, and why?
– To ensure the intervention does not affect the likelihood of appearing in a data set

Differential Reporting

• Likelihood of reporting outcome is correlated with treatment
– True value of the outcome may not differ between treatment and control, but due to the intervention, treatment group is more likely to report a certain outcome or appear in administrative records

To address differential reporting

Identify how the intervention may affect the reporting of
outcomes
– Identify the context in which the data were collected
– Determine direction in which estimates are likely to be biased
– E.g., do number of doctor’s visits reflect severity of sickness or stronger connection to the health system?

To address possibly inaccurate data

• Cross-reference with other sources to ensure accuracy • Identify the data agency’s quality control protocol • Choose indicators that are unlikely to be incorrectly reported – Select variables that are straightforward and less susceptible to human error – Request raw variables • Communicate with program or implementing partner responsible for collecting data – Ask how and why data are collected

Reporting Bias

• From an individual – E.g., under-reporting income to qualify for a social welfare program • From an administrative organization – E.g., schools over-report attendance to meet requirements

To address reporting bia

• Identify the context in which the data were collected – Were there incentives to misreport information? • Choose variables that are not susceptible to bias – E.g., hospital visit v. value of insurance claim