Week 6 - Different Data formats Flashcards
What kind of data is .sav?
Binary format data for statistical packages such as SPSS.
How should we handle missing values?
Either replace or remove.
What is the difference between read_csv and read.csv
Read_csv is quicker and shows the progress. Automatically parses different types of information.
What kind of format is climate data stored as?
netCDF (network common data form)
What is JSON?
JSON essentially replaces XML. It is commonly used for web and is language independent.
What should we do with missings for a plot?
Keep them in the plot, but make it obvious they are missing (i.e. in the border)
How do we handle missings?
If there are only a few missings, drop them.
If many missings for a variable - drop it.
Few missings in many vars and cases - Impute values.
What are the common ways to impute values?
Simple parametric: using mean or median
Simple non-parametric: find k nearest neighbours with complete values and average
Multiple imputation: using stat distribution e.g. normal model and simulate the missing.
What is a contingency table?
Where you have two categorical variables and you count the occurrences.