Week 6 - Different Data formats Flashcards

Question 1

Q

What kind of data is .sav?

Answer

A

Binary format data for statistical packages such as SPSS.

Question 2

Q

How should we handle missing values?

Answer

A

Either replace or remove.

Question 3

Q

What is the difference between read_csv and read.csv

Answer

A

Read_csv is quicker and shows the progress. Automatically parses different types of information.

Question 4

Q

What kind of format is climate data stored as?

Answer

A

netCDF (network common data form)

Question 5

Q

What is JSON?

Answer

A

JSON essentially replaces XML. It is commonly used for web and is language independent.

Question 6

Q

What should we do with missings for a plot?

Answer

A

Keep them in the plot, but make it obvious they are missing (i.e. in the border)

Question 7

Q

How do we handle missings?

Answer

A

If there are only a few missings, drop them.
If many missings for a variable - drop it.
Few missings in many vars and cases - Impute values.

Question 8

Q

What are the common ways to impute values?

Answer

A

Simple parametric: using mean or median
Simple non-parametric: find k nearest neighbours with complete values and average
Multiple imputation: using stat distribution e.g. normal model and simulate the missing.

Question 9

Q

What is a contingency table?

Answer

A

Where you have two categorical variables and you count the occurrences.

Week 6 - Different Data formats Flashcards

(9 cards)