exploratory data analysis (week 1-2) Flashcards
types of data analysis
descriptive, inferential, predictive
what is descriptive data
summarize data, highlight any patterns, have central tendency, dispersion, and shape of distribution
what is inferential data
collect sample to represent the wider population, estimate parameter, testing hypothesis
what is predictive analysis
use past data to make predictions, divide data into training and testing set
process of data analysis
a. develop clear analysis
b. identify data required
c. collect data (external and internal)
d. process/format/clean data
e. perform exploratory or preliminary data analysis (basic)
f. fit the model into the data
g. communicate the result
h. monitor ongoing experience
i. comply with professional guidance and legal requirements
4 data resources?
a. simple random sampling
b. stratified sampling
c. cross-sectional data
d. longitudinal data
what is simple random sampling?
random, so have equal chance to be selected
what is stratified sampling?
split group to specific criteria, then pick random
what is cross-sectional data?
different variable of interest are recorded across all objects at a single point of time
what is longitudinal data?
different variables of interest of particular object are recorded closely
type of error in the data collected (3)
a. censored data (only partially known)
b. truncated data (some values are missing)
c. big data (need machine learning)
reproducility
all information to produce the same step of research is given and produce the same result, so third party can start from scratch
replication
data to generate the same result is provided
what does exploratory data analysis do
it analyze the data and identify any basic patterns or relationship. It also find most important variables, detect any data error.
exploratory data analysis on univariate var
mean, median, quantile, sd, skewness, dsb