Data Management Flashcards

Question 1

Q

Respondents’ Unique IDs can potentially be used to:

Answer

A

Link respondents’ personally identifiable information to their responses

Link different tables with a different structure in a relational database.

Link raw data to analysis code and to analysis output

Question 2

Q

The metadata we track for the data collection process includes

Answer

A

Surveyor assignments
Completion rate
Surveyor attrition

Question 3

Q

Master code files are meant specifically to:

Answer

A

Run (call) all other coding files in the project

Question 4

Q

We use relative references in our code so that

Answer

A

We do not need to repeat the full file path of the working directory for each file used or created

Different analysts who have different locations for their project folder do not need to change the file path for each file

Question 5

Q

When publishing data, the code book should be created when?

Answer

A

After the dataset is final, before data publication

The codebook describes the data such as variable names, labels, question text, and summary statistics such as the mean, minimum and maximum values, etc. Because variables may be generated during analysis, and summary statistics may change after certain cleaning decisions, it is best to produce the code book at the very end, when the datasets are final.

Question 6

Q

Which documents are included in the “manual” for the published data?

Answer

A

ReadMe file

Code book

Question 7

Q

According to Gentzkow and Shapiro, rather than naming the latest version of a file: regressions_022713_mg.do, one should instead:

Answer

A

Use version control software, and not use dates

Question 8

Q

What is required to merge two datasets?

Answer

A

There needs to be a relational parameter or “foreign key” (i.e. variable on which to merge the two datasets)

Question 9

Q

Merge

Answer

A

A horizontal combination of datasets by a unique ID

Question 10

Q

Append

Answer

A

A vertical combination of data sets that possess variables in common (at least a subset); same variable names and datatypes

Adds observations to the existing variables

Question 11

Q

Master file

Answer

A

a file that runs ALL code in your project

Useful for:
– Setting any globals that might be used across do-files
– Installing user-written commands

Question 12

Q

Codebook

Answer

A

• Contains information about the data: variable name,
labels, question text, min/max values, etc.
• Critical for easy interpretation of the data and in
furthering analysis
• Have do-file that creates codebook from raw data
• When: Created once the data set is final

Question 13

Q

ReadMe Files

Answer

A

• Outlines key information about all published files: data
and analysis files, questionnaires, codebooks
– E.g. format of the data (such as # of observations per
student, # of variables)
• Describes how data/analysis files interact with one
another – e.g. which came first, is one a subset of another?
• When: Immediately after each round of data collection