Week 6 - Different Data formats Flashcards

1
Q

What kind of data is .sav?

A

Binary format data for statistical packages such as SPSS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How should we handle missing values?

A

Either replace or remove.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between read_csv and read.csv

A

Read_csv is quicker and shows the progress. Automatically parses different types of information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What kind of format is climate data stored as?

A

netCDF (network common data form)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is JSON?

A

JSON essentially replaces XML. It is commonly used for web and is language independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should we do with missings for a plot?

A

Keep them in the plot, but make it obvious they are missing (i.e. in the border)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we handle missings?

A

If there are only a few missings, drop them.
If many missings for a variable - drop it.
Few missings in many vars and cases - Impute values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the common ways to impute values?

A

Simple parametric: using mean or median
Simple non-parametric: find k nearest neighbours with complete values and average
Multiple imputation: using stat distribution e.g. normal model and simulate the missing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a contingency table?

A

Where you have two categorical variables and you count the occurrences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly