Data Organization in Spreadsheets Flashcards

1
Q

Spreadsheats are mst often used as a multipurpose tool for what..?

A

For data entry, storage, analysis and visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Most spreadsheetprograms allow users to perform all the tasks (data entry, storage, analysis and visualization). Does the paper recommend to perform all these tasks with the use of spreadsheet programs? Why (not)?

A

No, spreadsheets are best suited for data entry and storage. Analysis and visualization should happen separately. This reduces the risk of contaminating or destroying the raw data in the spreadsheet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The first rule of data organization is to be consistent. Why is this and what does this mean?

A

Entering and organizing your data in aconsistent way from the start will prevent you and your collab-orators from having to spend time harmonizing the data later.
- Use consistent codes for categorical variables
- Use a consistent fixed code for any missing values
- Use consistent variable names
- Use consistent subject identifiers
- Use consistent data layout in multiple files
- Use consistent file names
- Use consistent format for all dates
- Use consistent phrases in your notes
- Be careful about extra spaces within cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The second rule of data organization is to choose good names for things. What is meant by this?

A

It is important to pick good names for things. This can be hard, and so it is worth putting some time and thought into it. As a general rule, do not use spaces, either in variablenames or file names. They make programming harder: the analyst will need to surround everything in double quotes, like”glucose 6 weeks”, rather than just writing glucose_6_weeks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How should dates be written within a spreadsheet?

A

As YYYY-MM-DD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can you leave cells empty?

A

No, use common code for missing data (NA/-/999).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can you put more than one piece of information in a cell?

A

No, the cells in your spreadsheet should each contain one piece of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the best layout for your data within a spreadshit?

A

A single big rectangle with rows corresponding to subjects and columns corresponding to variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does it imply if data does not fit into a set of rectangles?

A

That maybe spreadsheets are not the best format for them, as spreadsheets are inherently rectangular.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a data dictionary?

A

A separate file that explains what all of the variables are. It is helpful if this is laid out in a rectangular form, so that the data analyst can make use of it in analyses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can a data dictionary contain?

A
  • The exact variable name as in the data file
  • A version of the variable name that might be used in data visualizations
  • A longer explanation of what the variable means
  • The measurement units
  • Expected minimum and maximum values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can the spreadsheet contain calculations and graphs?

A

No, the primary data file should only contain the data and nothing else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why isn’t it advised to use calculations and graphs in your spreadsheet?

A

If you are doing calculations in your data file, that likely means you are regularly opening it and typing into it. Doing so incurs some risk that you will accidentally type junk into your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Do not use font color or highlighting as data
You might be tempted to highlight particular cells with suspicious data, or rows that should be ignored. Or the font or font color might have some meaning. What should you do instead?

A

Add another column with an indicator variable (e.g. ‘trusted’ with values TRUE or FALSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is also very important to do?

A

Make regular backups of your data. In multiple locations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Regarding the task of data entry, it is important to ensure that the process is as error-free and repetitive-stress-injury-free as possible. How can you do this?

A

By using the tool ‘data validation’ to control the type of data or the values that users can enter into a cells.

17
Q

What kind of copy should you keep of your data file?

A

A plain text format, with comma or tab delimiters.